Kafka and Confluent
For starters:
- Kafka has gotten considerable attention and adoption in streaming.
- Kafka is open source, out of LinkedIn.
- Folks who built it there, led by Jay Kreps, now have a company called Confluent.
- Confluent seems to be pursuing a fairly standard open source business model around Kafka.
- Confluent seems to be in the low to mid teens in paying customers.
- Confluent believes 1000s of Kafka clusters are in production.
- Confluent reports 40 employees and $31 million raised.
At its core Kafka is very simple:
- Kafka accepts streams of data in substantially any format, and then streams the data back out, potentially in a highly parallel way.
- Any producer or consumer of data can connect to Kafka, via what can reasonably be called a publish/subscribe model.
- Kafka handles various issues of scaling, load balancing, fault tolerance and so on.
So it seems fair to say:
- Kafka offers the benefits of hub vs. point-to-point connectivity.
- Kafka acts like a kind of switch, in the telecom sense. (However, this is probably not a very useful metaphor in practice.)
My favorite educational video
My favorite educational video growing up, by far, was a 1960 film embedded below. I love it because it pranks its viewers, starting right in the opening scene. (Start at the 0:50 mark to see what I mean.)
If you’re ever in the position of helping a kid or young adult understand physics, this video could be a great help. Frankly, it could help in political discussions as well …
MongoDB 3.0
Old joke:
- Question: Why do policemen work in pairs?
- Answer: One to read and one to write.
A lot has happened in MongoDB technology over the past year. For starters:
- The big news in MongoDB 3.0* is the WiredTiger storage engine. The top-level claims for that are that one should “typically” expect (individual cases can of course vary greatly):
- 7-10X improvement in write performance.
- No change in read performance (which however was boosted in MongoDB 2.6).
- ~70% reduction in data size due to compression (disk only).
- ~50% reduction in index size due to compression (disk and memory both).
- MongoDB has been adding administration modules.
- A remote/cloud version came out with, if I understand correctly, MongoDB 2.6.
- An on-premise version came out with 3.0.
- They have similar features, but are expected to grow apart from each other over time. They have different names.
*Newly-released MongoDB 3.0 is what was previously going to be MongoDB 2.8. My clients at MongoDB finally decided to give a “bigger” release a new first-digit version number.
To forestall confusion, let me quickly add: Read more
Categories: Database compression, Hadoop, Humor, In-memory DBMS, MongoDB, NoSQL, Open source, Structured documents, Sybase | 9 Comments |
Cautionary tales
Before the advent of cheap computing power, statistics was a rather dismal subject. David Lax scared me off from studying much of it by saying that 90% of statistics was done on sets of measure 0.
The following cautionary tale also dates to that era. Other light verse below. Read more
Categories: Humor, Predictive modeling and advanced analytics | 1 Comment |
Three old jokes
Modern analytics described in three old jokes.
The drunk under the lamppost
A man is on his hands and knees, looking for something under a lamppost and obviously not finding it. The neighborhood policeman asks what he is doing.
“I’m looking for my keys.”
“Did you lose them around here?”
“Not exactly; I think they fell out of my pocket down the street a bit.”
“Then why aren’t you looking for them down the street?”
“The light is better over here.”
But seriously
Some people use statistics the way a drunk uses a lamppost — more for support than for illumination.
Seek and …
A family that’s looking to start organic gardening has a large pile of manure dumped into their backyard. Their daughter grabs a shovel and digs in excitedly, shouting:
“Look at all this … stuff! There must be a pony in here somewhere!!”
Categories: Humor | 5 Comments |
Comments on Oracle’s third quarter 2012 earnings call
Various reporters have asked me about Oracle’s third quarter 2012 earnings conference call. Specific Q&A includes:
What did Oracle do to have its earnings beat Wall Street’s estimates?
Have a bad second quarter and then set Wall Street’s expectations too low for Q3. This isn’t about strong results; it’s about modest expectations.
Can Oracle be a leader in both hardware and software?
- It’s not inconceivable.
- The observation that Oracle, IBM, and Teradata all are pushing hardware-software combinations has been intriguing ever since IBM bought Netezza. (SAP really isn’t, however; ditto Microsoft.)
- I do think Oracle may be somewhat overoptimistic as to how cooperative the Sun user base will be in buying more high-end product and in paying more in maintenance for the gear they already have.
Beyond that, please see below.
What about Oracle in the cloud?
MySQL is an important cloud supplier. But Oracle overall hasn’t demonstrated much understanding of what cloud technology and business are all about. An expensive SaaS acquisition here or there could indeed help somewhat, but it seems as if Oracle still has a very long way to go.
Other comments
Other comments on the call, whose transcript is available, include: Read more
Categories: Cloud computing, Exadata, Humor, In-memory DBMS, Oracle, SAP AG, Software as a Service (SaaS) | 5 Comments |
Metaphors amok
It all started when I disputed James Kobielus’ blogged claim that Hadoop is the nucleus of the next-generation cloud EDW. Jim posted again to reiterate the claim, only this time he wrote that all EDW vendors [will soon] bring Hadoop into their heart of their architectures. (All emphasis mine.)
That did it. I tweeted, in succession:
- Actually, I vote for Hadoop as the lungs of the EDW — first place of entry for essential nutrients.
- Data integration can be the heart of the EDW, pumping stuff around. RDBMS/analytic platform can be the brain.
- iPad-based dashboards that may engender envy, but which actually are only used occasionally and briefly … well, you get the picture.*
*Woody Allen said in Sleeper that the brain was his second-favorite organ.
Of course, that body of work was quickly challenged. Responses included: Read more
Categories: Analytic technologies, Business intelligence, Data warehousing, EAI, EII, ETL, ELT, ETLT, Fun stuff, Hadoop, Humor, MapReduce | Leave a Comment |
Notes and links October 3 2010
Some notes, follow-up, and links before I head out to California: Read more
Categories: GIS and geospatial, Google, HP and Neoview, Humor, Kickfire, Netezza, Solid-state memory, Teradata, Web analytics | 3 Comments |
Some interesting links
In no particular order: Read more
Categories: Business intelligence, EnterpriseDB and Postgres Plus, Fun stuff, Hadoop, Humor, In-memory DBMS, MapReduce, Memory-centric data management, Open source, Oracle, SAP AG | 2 Comments |
The Wonderful One-Hoss Shay
I often write of Bottleneck Whack-A-Mole, an engineering approach that ensues when parts of a system are out of balance. Well, the flip side of that is the One-Hoss Shay, as in Oliver Wendell Holmes’ marvelous poem. (Here’s a version with Howard Pyle illustrations.) Read more
Categories: Humor, Theory and architecture | 1 Comment |