Memory-centric data management
Analysis of technologies that manage data entirely or primarily in random-access memory (RAM). Related subjects include:
- Oracle TimesTen
- solidDB
- QlikTech
- SAP‘s BI Accelerator
- Exasol
- Solid-state memory as a replacement for disk
Truviso and EnterpriseDB blend event processing with ordinary database management
Truviso and EnterpriseDB announced today that there’s a Truviso “blade” for Postgres Plus. By email, EnterpriseDB Bob Zurek endorsed my tentative summary of what this means technically, namely:
There’s data being managed transactionally by EnterpriseDB.
Truviso’s DML has all along included ways to talk to a persistent Postgres data store.
If, in addition, one wants to do stream processing things on the same data, that’s now possible, using Truviso’s usual DML.
Optimizing WordPress database usage
There’s an amazingly long comment thread on Coding Horror about WordPress optimization. Key points and debates include:
- WordPress makes scads of database calls on every page. (20 is the supposed default number. That sounds a little high to me, but not wholly incredible.)
- Therefore one should use a caching plug-in. WP-Cache is the preferred one. WP-Super-Cache gets some votes as perhaps being even better.
- In theory the database cache should handle most of the problem. (After all, many of those database queries are the same for every page.) In practice, it often doesn’t, even if you use dedicated (as opposed to shared) web hosting.
- LAMP vs. Microsoft stack (uh-oh).
- Drupal vs. WordPress vs. Movable Type vs. Joomla vs. do-it-yourself (uh-oh too).
Another theme is — well, it’s WordPress “theme” design. Do you really need all those calls? The most dramatic example I can think of one I experienced soon after I started this blog. Some themes have the cool feature that, in the category list on the sidebar, there’s a count of the number of posts in the category. Each category. I love that feature, but its performance consequences are not pretty.
As previously noted, we’ll be doing an emergency site upgrade ASAP. Once we’re upgraded to WordPress 2.5, I hope to deploy a rich set of back-end plug-ins. One of the caching ones will be among them.
Categories: About this blog, Application areas, Cache | 1 Comment |
EnterpriseDB unveils Postgres Plus
EnterpriseDB is making a series of moves and announcements. Highlights include:
- Renaming/repositioning the product as “Postgres Plus.” The free product is now Postgres Plus, while the version you pay EnterpriseDB for is now Postgres Plus Advanced Server.
- Repackaging the products, so that Postgres Plus Advanced Server is a strict superset of Postgres Plus.
- New features added to Postgres Plus Advanced Server.
- Features newly migrated from Advanced Server down to Postgres Plus.
- A strategic investment by IBM.
- Stressing Postgres in EnterpriseDB marketing, and dropping the tag-line defining themselves as “the Oracle-compatible database company.”
So far as I can tell, most of the technical differences between Advanced Server and regular Postgres Plus lie in three areas: Read more
Categories: Cache, Emulation, transparency, portability, EnterpriseDB and Postgres Plus, Mid-range, MySQL, OLTP, Open source, PostgreSQL | 1 Comment |
CEP is entering BI
I talked with both Coral8 and Truviso this afternoon. They both have their financial services efforts, of course. Coral8 also continues to get business doing data reduction for sensor networks — mainly RFID and utilities, I think. Coral8 is working on some really cool and confidential other stuff as well.
But my biggest takeaway from this pair of calls was that Coral8 and Truviso are penetrating general BI. Read more
Categories: Aleri and Coral8, Analytic technologies, Business intelligence, Memory-centric data management, Streaming and complex event processing (CEP), Truviso | Leave a Comment |
What to call CEP
It seems that the CEP folks are still concerned about what to call themselves. There really are only three choices:
- Complex event processing
- Event processing
- Event stream processing
“Stream processing” might once have been on the list, but it has too many other meanings, and “streaming” adds more meanings yet.
“Complex” has the virtue of inertia; CEP is the closest thing the category has to an agreed-upon name. But few people want to buy technology that describes itself as being “complex.” And in any case it’s not clear how complex many of those events are. “Event stream processing” isn’t terribly well established, and to some extent it runs afoul of the same ambiguities as “stream processing.” What’s worse, those names lead to four-word product category names. Who really wants to market or hear about “complex event processing engines” or “event stream processing platforms”?
So let’s just call the category “event processing” and have done with it, OK? Products can, if they want, be “event processing somethings.” Names like that wouldn’t be any more of a mouthful than “data warehouse appliance,” and the latter category is doing pretty well for itself.
More Twitter weirdness
Twitter commonly has the problem of duplicate tweets. That is, if you post a message, it shows up twice. After a little while, the dupe disappears, but if you delete the dupe manually, the original is gone too.
I presume what’s going on is that tweets are cached, the tweets are eventually batched to disk, and they don’t always get deleted from cache until some time after they’re persisted. If you happen to check the page of your recent tweets inbetween — boom, you get two hits. But what I don’t understand is why the two versions have different timestamps.
Presumably, this could be explained at a MySQL User Conference session next month, one of whose topics will be Intelligent caching strategies using a hybrid MemCache / MySQL approach. I’m so glad they don’t use stupid strategies to do this … Read more
Categories: Cache, MySQL, OLTP, Specific users | 3 Comments |
ObjectGrid versus H-Store
Billy Newport of IBM sees a lot of similarities between his app-server-based product ObjectGrid and H-Store. In both cases, constrained tree schemas are assumed, and OLTP performance goodness ensues. A couple of points I noted on a quick skim through his blog:
- He calls out RAM consumption as a challenge for this kind of architecture.
- He points out that it’s a big advantage to have data called and used in the same address space.
Being based in RAM is obviously a huge part of the H-Store scheme. But so is having transaction execution be close to the database.
IBM now has both ObjectGrid and a memory-centric DBMS (solidDB) that they’ve been using as a front end for DBMS. Integration of the two could be pretty interesting.
Categories: Cache, IBM and DB2, Memory-centric data management, OLTP, solidDB, Theory and architecture, VoltDB and H-Store | Leave a Comment |
The architectural assumptions of H-Store
I wrote yesterday about the H-Store project, the latest from the team of researchers who also brought us C-Store and its commercialization Vertica. H-Store is designed to drastically improve efficiency in OLTP database processing, in two ways. First, it puts everything in RAM. Second, it tries to gain an additional order of magnitude on in-memory performance versus today’s DBMS designs by, for example, taking a very different approach to ensuring ACID compliance.
Today I had the chance to talk with two more of the H-Store researchers, Sam Madden and Daniel Abadi. Read more
Categories: Database diversity, In-memory DBMS, Memory-centric data management, OLTP, VoltDB and H-Store | 5 Comments |
Mike Stonebraker calls for the complete destruction of the old DBMS order
Last week, Dan Weinreb tipped me off to something very cool: Mike Stonebraker and a group of MIT/Brown/Yale colleagues are calling for a complete rewrite of OLTP DBMS. And they have a plan for how to do it, called H-Store, as per a paper and an associated slide presentation.
Categories: Database diversity, In-memory DBMS, Memory-centric data management, Michael Stonebraker, OLTP, Theory and architecture, VoltDB and H-Store | 36 Comments |
Fixing Twitter in three letters: CEP
There’s a lot of agitation today because Twitter broke under the message volume generated during Steve Jobs’ Macworld keynote. I don’t know what that volume was, but I just checked the lower volume of tweets (i.e., updates) going through the “public timeline” (i.e., everything) twice, and both times it was under 200 messages per minute. So, let’s say there’s a much higher volume at peak times, and also hypothesize that Twitter would like to grow a lot, and say that Twitter would like to handle 10-100,000 messages/minute – i.e., 1000+/second — as soon as possible.
That’s easy using CEP (Complex Event Processing). A Twitter update is just a string of 140 or fewer characters. It is associated with three pieces of metadata – author, time, and mode of posting. It should be visible in real time to any of the author’s “followers,” as well as in a single public timeline; perhaps there will be other kinds of Twitter channels in the future. In most cases, these updates are only visible to a user upon page refresh. Almost nNo Twitter user seems to have more than about 7,000 followers, even Robert Scoble or Evan Williams.* The average number of followers, at least among active updaters, is probably in the low hundreds now. So basically, this is all a heckuva lot easier than the tick-monitoring systems Wall Street firms are using today.
*I believe there’s a hard cap of 7,500, but nobody seems to have bumped against it yet.Twitterholic gives a different figure than Twitter does for Scoble. And it correctly shows Dave Troy with a little over 10,000.
Here’s how to implement that. Read more