[KSQL engine introduction]

Kafka author Neha Narkhede published a blog post at Confluent about Kafka's newly introduced KSQL engine - a stream-based SQL. The introduction of KSQL is to lower the threshold of streaming processing and provide a simple and complete interactive SQL interface for processing Kafka data. KSQL currently supports a variety of streaming operations, including aggregates, joins, time windows, sessions, and more.

 

Key differences from traditional SQL

KSQL is still very different from SQL in relational databases. Traditional SQL is an instant one-time operation, whether query or update is performed on the current data set. And KSQL is different, KSQL queries and updates are continuous, and the data set can be continuously increased. What KSQL does is actually a transformation operation, which is stream processing.

 

Applicable scenarios of KSQL

1. Real-time monitoring

On the one hand, business-level metrics can be customized through KSQL, and these metrics can be obtained in real time. The underlying metrics cannot tell us the actual behavior of the application, so custom metrics based on the raw events generated by the application can provide a better understanding of the health of the application. On the other hand, it is possible to define some kind of criteria for an application through KSQL that can be used to check whether the application behaves as expected in a production environment.

 

2. Security testing

KSQL converts event streams into time series data containing numerical values, and then displays these data on the UI through visualization tools, so that many behaviors that threaten security, such as fraud, intrusion, and so on, can be detected. KSQL provides a real-time, simple and complete solution for this.

 

3. Online data integration

Most data processing will go through the process of ETL (Extract-Transform-Load), and such systems usually complete data processing through timed batch jobs, but the delay caused by batch jobs It is unacceptable in many cases. By using KSQL and Kafka connectors, batch data integration can be turned into online data integration. For example, by joining streams to tables, it is possible to populate data in event streams with metadata stored in data tables, or to filter out sensitive information from data before transmitting it to other systems.

 

4. Application Development

For complex applications, it may be more appropriate to use Kafka's native Streams API. However, for simple applications, or for people who don't like Java programming, KSQL would be a better choice.

 

 

KSQL schema



 

KSQL is an independent running server, multiple KSQL servers can form a cluster, and server instances can be added dynamically. Clusters are fault-tolerant, and if one server fails, other servers take over. The KSQL command line client initiates query operations to the cluster through the REST API, and can view stream and table information, query data, and view query status. Because it is built on the Streams API, KSQL also inherits the elasticity, state management and fault tolerance of the Streams API, and also has exactly once semantics. KSQL Server embeds these features and adds a distributed SQL engine, automatic bytecode generation for improved query performance, and a REST API for query execution and management.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326222016&siteId=291194637
Recommended