Presentations
Posts focused on live presentations, typically by Curt Monash.
Flash, other solid-state memory, and disk
If there’s one subject on which the New England Database Summit changed or at least clarified my thinking,* it’s future storage technologies. Here’s what I now think:
- Solid-state memory will soon be the right storage technology for a large fraction of databases, OLTP and analytic alike. I’m not sure whether the initial cutoff in database size is best thought of as terabytes or 10s of terabytes, but it’s in that range. And it will increase over time, for the usual cheaper-parts reasons.
- That doesn’t necessarily mean flash. PCM (Phase-Change Memory) is coming down the pike, with perhaps 100X the durability of flash, in terms of the total number of writes it can tolerate. On the other hand, PCM has issues in the face of heat. More futuristically, IBM is also high on magnetic racetrack memory. IBM likes the term storage-class memory to cover all this — which I find regrettable, since the acronym SCM is way overloaded already. 🙂
- Putting a disk controller in front of solid-state memory is really wasteful. It wreaks havoc on I/O rates.
- Generic PCIe interfaces don’t suffice either, in many analytic use cases. Their I/O is better, but still not good enough. (Doing better yet is where Petascan – the stealth-mode company I keep teasing about – comes in.)
- Disk will long be useful for very large databases. Kryder’s Law, about disk capacity, has at least as high an annual improvement as Moore’s Law shows for chip capacity, the disk rotation speed bottleneck notwithstanding. Disk will long be much cheaper than silicon for data storage. And cheaper silicon in sensors will lead to ever more machine-generated data that fills up a lot of disks.
- Disk will long be useful for archiving. Disk is the new tape.
*When the first three people to the question microphone include both Mike Stonebraker and Dave DeWitt, your thinking tends to clarify in a hurry.
Related links
- A slide deck by Mohan of IBM similar to the one he presented at the NEDB Summit about storage-class memories.
- A much more detailed IBM presentation on storage-class memories.
- Oracle’s and Teradata’s beliefs about the importance of solid-state memory.
Other posts based on my January, 2010 New England Database Summit keynote address
- Data-based snooping — a huge threat to liberty that we’re all helping make worse
- Interesting trends in database and analytic technology
- Open issues in database and analytic technology
Categories: Data warehousing, Michael Stonebraker, Presentations, Solid-state memory, Storage, Theory and architecture | 3 Comments |
Data-based snooping — a huge threat to liberty that we’re all helping make worse
Every year or two, I get back on my soapbox to say:
- Database and analytic technology, as they evolve, will pose tremendous danger to individual liberties.
- We in the industry who are creating this problem also have a duty to help fix it.
- Technological solutions alone won’t suffice. Legal changes are needed.
- The core of the needed legal changes are tight restrictions on governmental use of data, because relying on restrictions about data acquisition and retention clearly won’t suffice.
But this time I don’t plan to be so quick to shut up.
My best writing about the subject of liberty to date is probably in a November, 2008 blog post. My best public speaking about the subject was undoubtedly last Thursday, early in my New England Database Summit keynote address; I got a lot of favorable feedback on that part from the academics and technologists in attendance.
My emphasis is on data-based snooping rather than censorship, for several reasons:
- My work and audience are mainly in the database and analytics sectors. Censorship is more a concern for security, networking, and internet-technology folks.
- After censorship, I think data-based snooping is the second-worst technological threat to liberty.
- In the US and other fairly free countries, data-based snooping may well be the #1 threat.
Categories: Analytic technologies, Data warehousing, Presentations, Surveillance and privacy | 8 Comments |
New England Database Summit (January 28, 2010)
New England Database Day has now, in its third year, become a “Summit.” It’s a nice event, providing an opportunity for academics and business folks to mingle. The organizers are basically the local branch of the Mike Stonebraker research tree, with this year’s programming head being Daniel Abadi. It will be on Thursday, January 28, 2010, once again in the Stata Center at MIT. It would be reasonable to park in the venerable 4/5 Cambridge Center parking lot, especially if you’d like to eat at Legal Seafood afterwards.
So far there are two confirmed speakers — Raghu Ramakrishnan of Yahoo and me. My talk title will be something like “Database and analytic technology: The state of the union”, with all wordplay intended.
There’s more information at the official New England Database Summit website. There’s also a post with similar information on Daniel Abadi’s DBMS Musings blog.
Edit after the event:
Posts based on my January, 2010 New England Database Summit keynote address
- Data-based snooping — a huge threat to liberty that we’re all helping make worse
- Flash, other solid-state memory, and disk
- Interesting trends in database and analytic technology
- Open issues in database and analytic technology
Categories: Analytic technologies, Data warehousing, Michael Stonebraker, Presentations, Theory and architecture | 4 Comments |
Boston Big Data Summit keynote outline
Last month, Bob Zurek asked me to give a talk on “Big Data”, where “big” is anything from a few terabytes on up, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I’m posting them below.
MapReduce webinars and annotated slides
As previously noted, I’m giving a webinar twice today — i.e., Thursday, October 15 — at 10:00 am and 1:00 pm Eastern time.
- The subject is MapReduce.
- The sponsor is Aster Data.
- Part of the webinar will be an explanation of MapReduce basics, especially the conflict between theory/propaganda and reality.
- As you might guess from the identity of the sponsor, there will be an emphasis on how MapReduce and SQL play nicely with each other.
- You can register for the webinar on Aster’s site.
- (Edit) The webinar replay can be found here.
- I’ve already uploaded the slides from which I will present. (But not the ones from which Aster folks will be talking. I’ve seen those, and there’s some good technical crunch in some of them.) The “Notes” under the slides have a number of relevant URLs for follow-up, as well as a small number of explanatory comments (e.g., as to why one slide simply has a quote from and corresponding picture of Shakespeare).
Categories: Aster Data, MapReduce, Presentations | 6 Comments |
I have some presentations coming up (all on October Thursdays)
On Thursday, October 15, and two different times (10:00 am and 1:00 pm Eastern time), I’ll be giving a webinar for Aster Data on MapReduce. The content is very much work in progress, but it definitely will:
- Be overviewy in nature
- Emphasize SQL/MapReduce integration
Then, on the evening of Thursday, October 22, there’s something called the Boston Big Data Summit, in Waltham, where “Big Data” evidently is to be construed as anything from a few terabytes on up. (Things are smaller in the Northeast than in California …) It’s being put together by Amrith Kumar (who I don’t really know) and Bob Zurek (who everybody knows). This is the inaguaral meeting. It seems I’m both giving the keynote and running the subsequent panel, one of whose participants will be Ellen Rubin. Read more
Categories: Analytic technologies, Aster Data, Cloud computing, MapReduce, Presentations | 4 Comments |
Thinking about analytic speed
For a variety of reasons, I don’t plan to post my complete Enzee Universe keynote slide deck soon, if ever. But perhaps one or more of its subjects are worth spinning out in their own blog posts.
I’m going to start with analytic speed or, equivalently, analytic latency. There is, obviously, a huge industry emphasis on speed. Indeed, there’s so much emphasis that confusion often ensues. My goal in this post is not really to resolve the confusion; that would be ambitious to the max. But I’m at least trying to call attention to it, so that we can all be more careful in our discussions going forward, and perhaps contribute to a framework for those discussions as well.
Key points include:
1. There are two important senses of “latency” in analytics. One is just query response time. The other is the length of the interval between when data is captured and when it is available for analytic purposes. They’re often conflated — and indeed I shall do so for the remainder of this post.
2. There are many different kinds of analytic speed, which to a large extent can be viewed separately. Major areas include:
- Data exploration. In-memory OLAP is a huge trend, and QlikView is a hot BI product line.
- Budgeting/planning. In an unprecedentedly frightening economy, annual planning/forecasting cycles may well be too slow.
- Operational integration. This is probably the biggest current area of mission-critical IT advancement. Not coincidentally, it is also the mainstay of the most expensive and complex data warehousing technologies. It’s also an ongoing area of application for event/stream processing, aka CEP.
- General or deep analytics. This is what I seem to spend much of my time writing about — data warehousing price/performance, parallelized data mining, and much more.
- Data administration. Ease of data mart spin-out and administration is becoming a major concern. And of course analytic appliance and DBMS vendors have been telling ease-of-deployment, low-DBA-involvement kinds of stories at least since Netezza first came to market.
There certainly are relationships among those; e.g., a really great analytic DBMS could help speed up any and all of the last three categories. But when assessing your needs, you can go quite far viewing each of those areas separately.
3. It is indeed important to carefully assess your need-for-speed. Acceptable levels of analytic latency vary widely, ranging from sub-millisecond to multi-month. Read more
Categories: Analytic technologies, Business intelligence, Data warehousing, Presentations | 5 Comments |
Netezza’s worldwide show-and-tell
In this economy, conference attendance is way down. Accordingly, a number of vendors have reevaluated whether it makes sense to have a traditional big-bang user conference, or whether it might make more sense to do a tour, bringing their message to multiple geographical areas. Netezza has opted for the latter course, something I’ve been well aware of for two reasons:
- Planning for the conferences and for Netezza’s product roll-out is of course coordinated, and product roll-out is something I advise my clients on.
- Netezza engaged me to speak at six different versions of the event (i.e., America and Europe, but not the Far East). There’s still time to contribute suggestions about my talk here.
Apparently, I’ll be talking late morning each time. My dates are:
- September 2, Boston
- September 9, Washington, DC
- September 15, Milan
- September 17, London
- September 24, San Francisco
- September 29, Chicago
The brand name of the events is Enzee Universe. Locations, registration information, and other particulars may be found on the Enzee Universe website.
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Netezza, Presentations | 2 Comments |
37 Ways To Get More From Analytics, Version 2.0
As I hoped, there were some very helpful responses to my post listing ways to improve analytic effectiveness. Here’s a second draft incorporating them. Comments continue to be very welcome. I need to finalize this soon. Read more
Categories: Analytic technologies, Business intelligence, Data warehousing, Presentations, Web analytics | 4 Comments |
37 Ways To Get More From Analytics
I posted several stages of my thinking in connection with a February presentation on how to buy an analytic DBMS. The whole process seemed like a success, with good input early on, and at least one new client directly attracted by the uploaded slide presentation. So now I’m trying the same idea again, starting at an even earlier stage of the process.
I’m going to be speaking this September at six of the seven installments of Netezza’s 2009 traveling regional user conference, namely those in London, Milan, and the United States. (Edited for schedule changes.) The topic is going to be something like “N Ways to Get More From Analytics”, for N a decent-sized two-digit integer. The talk is meant to be more conceptual, upbeat, rah-rah, and/or inspirational than is my usual style, at the cost of perhaps being less complete, detailed, or carefully organized. Right now I’m at the point of sharing an initial list of ideas, and throwing open the question: What did I leave out?
The initial list is: Read more