January 25, 2016

Kafka and more

In a companion introduction to Kafka post, I observed that Kafka at its core is remarkably simple. Confluent offers a marchitecture diagram that illustrates what else is on offer, about which I’ll note:

The red boxes — “Ops Dashboard” and “Data Flow Audit” — are the initial closed-source part. No surprise that they sound like management tools; that’s the traditional place for closed source add-ons to start.
“Schema Management”
- Is used to define fields and so on.
- Is not equivalent to what is ordinarily meant by schema validation, in that …
- … it allows schemas to change, but puts constraints on which changes are allowed.
- Is done in plug-ins that live with the producer or consumer of data.
- Is based on the Hadoop-oriented file format Avro.

Kafka offers little in the way of analytic data transformation and the like. Hence, it’s commonly used with companion products.

Per Confluent/Kafka honcho Jay Kreps, the companion is generally Spark Streaming, Storm or Samza, in declining order of popularity, with Samza running a distant third.
Jay estimates that there’s such a companion product at around 50% of Kafka installations.
Conversely, Jay estimates that around 80% of Spark Streaming, Storm or Samza users also use Kafka. On the one hand, that sounds high to me; on the other, I can’t quickly name a counterexample, unless Storm originator Twitter is one such.
Jay’s views on the Storm/Spark comparison include:
- Storm is more mature than Spark Streaming, which makes sense given their histories.
- Storm’s distributed processing capabilities are more questionable than Spark Streaming’s.
- Spark Streaming is generally used by folks in the heavily overlapping categories of:
  - Spark users.
  - Analytics types.
  - People who need to share stuff between the batch and stream processing worlds.
- Storm is generally used by people coding up more operational apps.

If we recognize that Jay’s interests are obviously streaming-centric, this distinction maps pretty well to the three use cases Cloudera recently called out.

Complicating this discussion further is Confluent 2.1, which is expected late this quarter. Confluent 2.1 will include, among other things, a stream processing layer that works differently from any of the alternatives I cited, in that:

It’s a library running in client applications that can interrogate the core Kafka server, rather than …
… a separate thing running on a separate cluster.

The library will do joins, aggregations and so on, and while relying on core Kafka for information about process health and the like. Jay sees this as more of a competitor to Storm in operational use cases than to Spark Streaming in analytic ones.

We didn’t discuss other Confluent 2.1 features much, and frankly they all sounded to me like items from the “You mean you didn’t have that already??” list any young product has.

Related links

My October, 2014 post on Streaming for Hadoop is a sort of predecessor to this two-post series.

Categories: Data integration and middleware, Databricks, Spark and BDAS, EAI, EII, ETL, ELT, ETLT, Hadoop, Kafka and Confluent, Market share and customer counts, Streaming and complex event processing (CEP)

Subscribe to our complete feed!

Comments

4 Responses to “Kafka and more”

Notes from a long trip, July 19, 2016 | DBMS 2 : DataBase Management System Services on July 22nd, 2016 9:05 am

[…] Back in January I wrote that the leaders were mainly Spark Streaming, followed by Storm. […]
Xi Liu on July 24th, 2016 2:14 am

Do you know about distributedlog : http://distributedlog.io/ ? It would be interesting to compare Kafka with it.
Introduction to data Artisans and Flink | DBMS 2 : DataBase Management System Services on August 24th, 2016 2:54 pm

[…] surveyed Spark Streaming, Storm et al. in […]
Condense on May 6th, 2025 3:25 am

Do you know guys condense is also fully managed by kafka platform also it is good to compare with confluent

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Kafka and more

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin