In one of our projects, we encountered a situation where the analytics team needed to work with data streams, but they had no programming skills. However, they did know how to work with SQL queries. It would have been nice to provide these analysts with a SQL layer on top of Kafka Streams.
KSQL is an SQL engine for Kafka that provides an interactive SQL interface that allows you to write queries for stream processing instead of writing a lot of code. KSQL is particularly well suited for fraud detection and real-time applications.
KSQL provides scalability and supports distributed stream processing operations, including aggregations, joins, windowing, etc. In addition, unlike SQL, where a database or batch processing system is invoked, query results in KSQL are produced continuously. Before moving on to writing threaded queries, let’s briefly review the basic concepts of KSQL.
An event stream is an unrestricted stream of individual events independent of each other, and an update or record stream is a stream of updates to previous records with the same key.
KSQL is based on the similar concept of queries from a Flow or Table. While a Flow is an infinite series of events or facts that do not change, in a Table, you can use a query to update facts or even delete them.
The terminology may vary, but the basic concepts are almost the same, and if you are familiar with Kafka Streams, you will feel confident with KSQL as well.
KSQL uses Kafka Streams to build and retrieve query results. KSQL consists of two components, the KSQL SLI and the KSQL server. It uses standard SQL tools such as MySql, Oracle, and even Hive can be used with the SLI when writing KSQL queries. The best of all versions of KSQL is the open source version (the licensed version of Apache 2.0).
The SLI is also a client that connects to the KSQL server. The KSQL server handles queries and retrieves data from Kafka and writes the results to Kafka.
KSQL works in two modes: standalone mode, which is used for prototyping, and development mode, or distributed mode, which is used with KSQL when working in a real-world data environment.
Although KSQL is an excellent tool as of this writing and works well for SQL data streaming, KSQL should be viewed as a developer preview tool that is not designed to work with production clusters.