Why large Internet love to use kafka?

What Kafka that?

Kafka generally used for real-time streaming data architecture to provide real-time analysis.

Kafka is exploding. More than a third of Fortune 500 companies use Kafka. These companies include the top ten tourist companies, the top ten banks in seven, eight of the top ten insurance companies in the top ten in nine telecommunications companies, and more. LinkedIn, Microsoft and Netflix use Kafka day (1,000,000,000,000) processing one trillion messages. Kafka for realtime data stream, a large collection of data or analyzed in real time (or both). Kafka used in conjunction with the memory in order to provide reliable services microstructure, it can be used to provide events to the CEP (Complex Event flow system) and IoT / IFTTT automation systems.

webp

Why Kafka?

Kafka commonly used in real-time streaming data architecture to provide real-time analysis. Since Kafka is a fast, scalable, robust and fault-tolerant publish, subscribe messaging system, Kafka was the case for JMS, RabbitMQ AMQP and probably because of the number and speed of response without being considered. Kafka higher throughput, reliability and replication properties, make it suitable for tracking service calls (calls per track) or the tracking sensor data conventional MOM things may not be considered.

Kafka may, Spark Streaming, Storm, HBase, Flink Spark and work with Flume / Flafka, to receive real-time, streaming data analysis and processing. Kafka data stream for providing a large data Hadoop lake. Kafka agent supports low latency or subsequent analysis in Hadoop Spark large number of message flow. Further, Kafka streaming (Component a) may be used for real-time analysis.

Kafka example

In short, Kafka for stream processing, Web site activity tracking, metrics collection and monitoring, log aggregation, real-time analytics, CEP, import the data into the Spark, import the data into Hadoop, CQRS, replay messages, error recovery, and ensure memory computing (micro-services) distributed commit log.

Who in the use of Kafka?

Many large companies handling large amounts of data using Kafka. LinkedIn derived from it, use it to track activity data and operational metrics. Twitter use it as part of the Storm to provide streaming infrastructure. Kafka Square used as a bus message, the system will transfer all the events to the various data center Square (log, custom events, metrics, etc.), is output to Splunk, Graphite (dashboard) and Esper-like / CEP alarm system. Spotify, Uber, Tumbler, Goldman Sachs, PayPal, Box, Cisco, CloudFlare and Netflix and other companies also use this method.

Kafka Why so popular?

Kafka simple operation. After the establishment and use of Kafka, Kafka is easy to understand how it works. However, Kafka is very popular mainly due to its outstanding performance. It is stable, reliable persistence with flexible publish - subscribe / queue, can be a good extension of the N consumer groups, with powerful copy functions, providing adjustable ensures the consistency of the producer, and The fragmentation level of a reserved sort (ie Kafka theme partitions). In addition, processing can be well have Kafka system stream processing, and that these systems can be polymerized, transform and load to other stores. However, if Kafka is slow, so these features are not important. Kafka Kafka is the most popular reason for outstanding performance.

Why Kafka so fast?

Kafka very dependent on the OS kernel to quickly move the data. It relies on the principle of zero-copy. Kafka enables you to record batch data block. These batches of data can be seen from the producer to the consumer to the file system (Kafka theme logs) end to end. Batch allows for more efficient data compression and to reduce I / O latency. Kafka writes uncommittable change log to disk order to avoid random disk access and slow disk looking for. Kafka provided by scale fragments. It will log into a subject of hundreds (possibly thousands) partition to thousands of servers. This decomposition allows Kafka handle huge loads.

Kafka Streaming Architecture

Kafka is most commonly used real-time transmission of data to other systems. Kafka is a middle layer, your real-time data pipeline can be decoupled. Kafka core is not suitable for direct calculation, such as data aggregation or CEP. Kafka Kafka streaming media is part of the ecosystem, provides the ability for real-time analysis. Kafka can be used to fast-track system (real-time data and operating systems), such as Storm, Flink, Spark flow, as well as your services and CEP systems. Kafka also used for traffic data analysis bulk data. Kafka provide Hadoop. It will stream data to your big data platform or RDBMS, Cassandra, Spark and even S3, the data for future analysis. These data are usually stored support data analysis, reporting, data scientific computing, compliance audit and backup.

webp

Kafka Streaming Chart

Let me help you answer this question seriously

  • Kafka is a distributed streaming media platform for publish and subscribe recorded stream. Kafka for fault tolerant storage.

  • Kafka Copy subject logging partition to multiple servers. Kafka designed to make your application processing records.

  • Kafka fast to efficiently use a batch IO and compressed record. Kafka for decoupling the data stream.

  • Kafka for the data stream to the data pool, and real-time application flow analysis system.

webp

Kafka data stream

Kafka is a Polyglot?

Kafka communication from the client and server use for wire-based version of the recording and the TCP protocol. Kafka committed to maintaining backward compatibility with old customers, support for multiple languages. There are C #, Java, C, Python, Ruby and other languages ​​of the client. Kafka ecosystem also provides REST agents, can be easily integrated via HTTP and JSON, making integration easier. Kafka also supports the registry Avro mode by mode confluence of Kafka. Avro and registry architecture allows customers to make a variety of programming languages ​​and reading complex record and allows the recording of evolution. Kafka is truly multilingual.

Kafka useful

Kafka允许您构建实时流数据管道。Kafka提供内存中的微服务(即actors,Akka, Baratine.io, QBit, reactors, reactive, Vert.x, RxJava, SpringReactor)。Kafka允许您构建实时流应用程序,对流进行反应,以进行实时数据分析,转换,反应,聚合,加入实时数据流以及执行CEP(复杂事件处理)。

您可以使用Kafka来帮助收集指标/关键绩效指标,汇总来自多个来源的统计信息,并实施事件采购。您可以将其与微服务(内存)和参与者系统一起使用,以实现内存中服务(分布式系统的外部提交日志)。

您可以使用Kafka在节点之间复制数据,为节点重新同步以及恢复状态。虽然Kafka主要用于实时数据分析和流处理,但您也可以将其用于日志聚合,消息传递,点击流跟踪,审计跟踪等等。

在这个数据科学和分析是一个大问题的世界里,捕获数据到数据湖和实时分析系统也是一件大事。而且由于Kafka可以承受这种剧烈的使用情况,Kafka是一个大成就。

Kafka有可扩展的消息存储

Kafka是一个很好的记录/信息存储系统。Kafka就像提交日志存储和复制的高速文件系统一样。这些特点使Kafka适用于各种应用场合。写入Kafka主题的记录会持久保存到磁盘并复制到其他服务器以实现容错。由于现代硬盘速度很快,而且相当大,所以这种硬盘非常适合,非常有用。Kafka生产者可以等待确认,直到该消息复制,信息会一直显示为制片人不完整。Kafka磁盘结构可以很好地扩展。现代磁盘驱动器在以大批量流式写入时具有非常高的吞吐量。此外,Kafka客户和消费者可以控制读取位置(偏移量),这允许在重要错误(即修复错误和重放)时重播日志等用例。而且,由于每个消费者群体都会跟踪偏移量,所以我们在这篇Kafka架构文章中提到,各位可以非常灵活(即重放日志)。

Kafka有记录保留

Kafka cluster to keep all the records released. If you do not set a limit, it will retain the records until enough disk space. For example, you can set three days or two weeks or a month retention policy. Record theme logs available for consumption until it is time, size or compression discarded so far. Size affects consumption rate from Kafka always wrote at the end of the theme of the log.


Guess you like

Origin blog.51cto.com/14288256/2400720