Scientific research
Discussion of how database and related technologies are used to support scientific research. Related subjects include:
Business intelligence notes and trends
I keep not finding the time to write as much about business intelligence as I’d like to. So I’m going to do one omnibus post here covering a lot of companies and trends, then circle back in more detail when I can. Top-level highlights include:
- Jaspersoft has a new v3.5 product release. Highlights include multi-tenancy-for-SaaS and another in-memory OLAP option. Otherwise, things sound qualitatively much as I wrote last September.
- Inforsense has a cool composite-analytical-applications story. More precisely, they said my phrase “analytics-oriented EAI” was an “exceptionally good” way to describe their focus. Inforsense’s biggest target market seems to be health care, research and clinical alike. Financial services is next in line.
- Tableau Software “gets it” a little bit more than other BI vendors about the need to decide for yourself how to define metrics. (Of course, it’s possible that other “exploration”-oriented new-style vendors are just as clued-in, but I haven’t asked in the right way.)
- Jerome Pineau’s favorable view of Gooddata and unfavorable view of Birst are in line with other input I trust. I’ve never actually spoken with the Gooddata folks, however.
- Seth Grimes suggests the qualitative differences between open-source and closed-source BI are no longer significant. He has a point, although I’d frame it more as being about the difference between the largest (but acquisition-built) BI product portfolios and the smaller (but more home-grown) ones, counting open source in the latter group.
- I’ve discovered about five different in-memory OLAP efforts recently, and no doubt that’s just the tip of the iceberg.
- I’m hearing ever more about public-facing/extranet BI. Information Builders is a leader here, but other vendors are talking about it too.
A little more detail Read more
Categories: Application areas, Business intelligence, Information Builders, Inforsense, Jaspersoft, QlikTech and QlikView, Scientific research, Tableau Software | 8 Comments |
Kognitio and WX-2 update
I went to Bracknell Wednesday to spend time with the Kognitio team. I think I came away with a better understanding of what the technology is all about, and why certain choices have been made.
Like almost every other contender in the market,* Kognitio WX-2 queries disk-based data in the usual way. Even so, WX-2’s design is very RAM-centric. Data gets on and off disk in mind-numbingly simple ways – table scans only, round-robin partitioning only (as opposed to the more common hash), and no compression. However, once the data is in RAM, WX-2 gets to work, happily redistributing as seems optimal, with little concern about which node retrieved the data in the first place. (I must confess that I don’t yet understand why this strategy doesn’t create ridiculous network bottlenecks.) How serious is Kognitio about RAM? Well, they believe they’re in the process of selling a system that will include 40 terabytes of the stuff. Apparently, the total hardware cost will be in the $4 million range.
*Exasol is the big exception. They basically use disk as a source from which to instantiate in-memory databases.
Other technical highlights of the Kognitio WX-2 story include: Read more
Categories: Application areas, Data warehousing, Kognitio, Scientific research | 2 Comments |
Big scientific databases need to be stored somehow
A year ago, Mike Stonebraker observed that conventional DBMS don’t necessarily do a great job on scientific data, and further pointed out that different kinds of science might call for different data access methods. Even so, some of the largest databases around are scientific ones, and they have to be managed somehow. For example:
- Microsoft just put out an overwrought press release. The substance seems to be that Pan-STARRS — a Jim Gray legacy also discussed in an August, 2008 Computerworld article — is adding 1.4 terabytes of image data per night, and one not so new database adds 15 terabytes per year of some kind of computer simulation output used to analyze protein folding. Both run on SQL Server, of course.
- Kognitio has an astronomical database too, at Cambridge University, adding 1/2 a terabyte of data per night.
- Oracle is used for a McGill University proteonomics database called CellMapBase. A figure of 50 terabytes of “mass storage” is included, which doesn’t include tape backup and so on.
- The Large Hadron Collider, once it actually starts functioning, is projected to generate 15 petabytes of data annually, which will be initially stored on tape and then distributed to various computing centers around the world.
- Netezza is proud of its ability to serve images and the like quickly, although off the top of my head I’m not thinking of a major customer it has in that area. (But then, if you just sell software, your academic discount can approach 100%; but if like Netezza you have an actual cost of goods sold, that’s not as appealing an option.)
Long-term, I imagine that the most suitable DBMS for these purposes will be MPP systems with strong datatype extensibility — e.g., DB2, PostgreSQL-based Greenplum, PostgreSQL-based Aster nCluster, or maybe Oracle.
Categories: Aster Data, Data types, Greenplum, IBM and DB2, Kognitio, Microsoft and SQL*Server, Netezza, Oracle, Parallelization, PostgreSQL, Scientific research | 1 Comment |
Is MapReduce a good underpinning for next-gen scientific DBMS?
Back in November, Mike Stonebraker suggested that there’s a need for database management advances to serve “big science”. He said:
Obviously, the best solution to these … problems would be to put everything in a next-generation DBMS — one capable of keeping track of data, metadata, and lineage. Supporting the latter would require all operations on the data to be done inside the DBMS with user-defined functions — Postgres-style.
Categories: Data types, MapReduce, Scientific research | Leave a Comment |