Application areas
Posts focusing on the use of database and analytic technologies in specific application domains. Related subjects include:
- Any subcategory
- (in Text Technologies) Specific application areas for text analytics
Notes, links, and comments January 20, 2011
I haven’t done a pure notes/links/comments post for a while. Let’s fix that now. (A bunch of saved-up links, however, did find their way into my recent privacy threats overview.)
First and foremost, the fourth annual New England Database Summit (nee “Day”) is next week, specifically Friday, January 28. As per my posts in previous years, I think well of the event, which has a friendly, gathering-of-the-clan flavor. Registration is free, but the organizers would prefer that you register online by the end of this week, if you would be so kind.
The two things potentially wrong with the New England Database Summit are parking and the rush hour drive home afterwards. I would listen with interest to any suggestions about dinner plans.
One thing I hope to figure out at the Summit or before is what the hell is going on on Vertica’s blog or, for that matter, at Vertica. The recent Mike Stonebraker post that spawned a lot of discussion and commentary has disappeared. Meanwhile, Vertica has had three consecutive heads of marketing leave the company since June, and I don’t know who to talk to there any more. Read more
Categories: About this blog, Analytic technologies, Data warehousing, GIS and geospatial, Investment research and trading, MongoDB, OLTP, Open source, PostgreSQL, Vertica Systems | 4 Comments |
The technology of privacy threats
This post is the second of a series. The first one was an overview of privacy dangers, replete with specific examples of kinds of data that are stored for good reasons, but can also be repurposed for more questionable uses. More on this subject may be found in my August, 2010 post Big Data is Watching You!
There are two technology trends driving electronic privacy threats. Taken together, these trends raise scenarios such as the following:
- Your web surfing behavior indicates you’re a sports car buff, and you further like to look at pictures of scantily-clad young women. A number of your Facebook friends are single women. As a result, you’re deemed a risk to have a mid-life crisis and divorce your wife, thus increasing the interest rate you have to pay when refinancing your house.
- Your cell phone GPS indicates that you drive everywhere, instead of walking. There is no evidence of you pursuing fitness activities, but forum posting activity suggests you’re highly interested in several TV series. Your credit card bills show that your taste in restaurant food tends to the fatty. Your online photos make you look fairly obese, and a couple have ashtrays in them. As a result, you’re judged a high risk of heart attack, and your medical insurance rates are jacked up accordingly.
- You did actually have that mid-life crisis and get divorced. At the child-custody hearing, your ex-spouse’s lawyer quotes a study showing that football-loving upper income Republicans are 27% more likely to beat their children than yoga-class-attending moderate Democrats, and the probability goes up another 8% if they ever bought a jersey featuring a defensive lineman. What’s more, several of the more influential people in your network of friends also fit angry-male patterns, taking the probability of abuse up another 13%. Because of the sound statistics behind such analyses, the judge listens.
Not all these stories are quite possible today, but they aren’t far off either.
Categories: Facebook, Predictive modeling and advanced analytics, Surveillance and privacy, Telecommunications, Web analytics | 4 Comments |
Privacy dangers — an overview
This post is the first of a series. The second one delves into the technology behind the most serious electronic privacy threats.
The privacy discussion has gotten more active, and more complicated as well. A year ago, I still struggled to get people to pay attention to privacy concerns at all, at least in the United States, with my first public breakthrough coming at the end of January. But much has changed since then.
On the commercial side, Facebook modified its privacy policies, garnering great press attention and an intense user backlash, leading to a quick partial retreat. The Wall Street Journal then launched a long series of articles — 13 so far — recounting multiple kinds of privacy threats. Other media joined in, from Forbes to CNet. Various forms of US government rule-making to inhibit advertising-related tracking have been proposed as an apparent result.
In the US, the government had a lively year as well. The Transportation Security Administration (TSA) rolled out what have been dubbed “porn scanners,” and backed them up with “enhanced patdowns.” For somebody who is, for example, female, young, a sex abuse survivor, and/or a follower of certain religions, those can be highly unpleasant, if not traumatic. Meanwhile, the Wikileaks/Cablegate events have spawned a government reaction whose scope is only beginning to be seen. A couple of “highlights” so far are some very nasty laptop seizures, and the recent demand for information on over 600,000 Twitter accounts. (Christopher Soghoian provided a detailed, nuanced legal analysis of same.)
At this point, it’s fair to say there are at least six different kinds of legitimate privacy fear. Read more
Categories: Analytic technologies, Facebook, GIS and geospatial, Health care, Surveillance and privacy, Telecommunications, Web analytics | 6 Comments |
MarkLogic and its document DBMS
This post has been long in the writing for several reasons, the biggest being that I stopped working for almost a month due to family issues. Please forgive its particularly choppy writing style; having waited this long already, I now lack the patience to further clean it up.
MarkLogic:
- Is an ACID-compliant, document-oriented, non-SQL, XML-based scale-out DBMS vendor of non-trivial size and momentum.
- Still has the same technical approach I previously described.
- Recently posted an internally-written white paper with a lot of technical detail.
- Recently had a point release — MarkLogic 4.2 — a lot of which seems to be “Oh, you didn’t have that before?” kinds of stuff.
- Has given me permission to post most of the slides from same, the first few of which give a nice overview of the MarkLogic story.
- Claims 200+ each of customers and employees (that’s from a slide MarkLogic did ask me to remove from the deck).
- Is a client again.
- Not coincidentally, is interested in branching out past the vertical markets of media and government/intelligence, in particular to the financial services market.
- Has finally rationalized its company and product names so that both are now “MarkLogic.” 🙂
- Has finally grasped that if it is proud of its ACID-compliance it probably shouldn’t be trying to market itself as “NoSQL”. 🙂
The privacy discussion is heating up
Internet privacy issues are getting more and more attention. Frankly, I think we’re getting past the point where the only big risk is loss of liberty. More and more, the risk of an excessive backlash is upon us as well. (In the medical area, I’d say it’s already more than a risk — it’s a life-wrecking reality. But now the problem is poised to become wider-spread.) Read more
Categories: Health care, Surveillance and privacy, Web analytics | 2 Comments |
Introduction to Kaminario
At its core, the Kaminario story is simple:
- Throw out your disks and replace them with, not Flash, but actual DRAM.
-
Your IOPS (Input/Output Per Second) are so high* that you get the performance you need without any further system changes.
- The whole thing is very fast to set up.
In other words, Kaminario pitches a value proposition something like (my words, not theirs) “A shortcut around your performance bottlenecks.”
*1 million or so on the smallest Kaminario K2 appliance.
Kaminario asserts that both analytics and OLTP (OnLine Transaction Processing) are represented in its user base. Even so, the use cases Kaminario mentioned seemed to be concentrated on the analytic side. I suspect there are two main reasons:
- As Kaminario points out, OLTP apps commonly are designed to perform in the face of regrettable I/O wait.
- Also, analytic performance problems tend to arise more suddenly than OLTP ones do.*
*Somebody can think up a new analytic query overnight that takes 10 times the processing of anything they’ve ever run before. Or they can get the urge to run the same queries 10 times as often as before. Both those kinds of thing happen less often in the OLTP world.
Accordingly, Kaminario likes to sell against the alternative of getting a better analytic DBMS, stressing that you can get a Kaminario K2 appliance into production a lot faster than you can move your processing to even the simplest data warehouse appliance. Kaminario is probably technically correct in saying that; even so, I suspect it would often make more sense to view Kaminario K2 appliances as a transition technology, by which I mean:
- You have an annoying performance problem.
- Kaminario K2 could solve it very quickly.
- That buys you time for a more substantive fix.*
- If you want, you can redeploy your Kaminario K2 storage to solve your next-worst performance bottleneck.
On that basis, I could see Kaminario-like devices eventually getting to the point that every sufficiently large enterprise should have some of them, whether or not that enterprise has an application it believes should run permanently against DRAM block storage. Read more
Categories: Investment research and trading, Kaminario, Solid-state memory, Storage, Telecommunications, Web analytics | 7 Comments |
More notes on Membase and memcached
As a companion to my post about Membase last week, the company has graciously allowed me to post a rather detailed Membase slide deck. (It even has pricing.) Also, I left one point out.
Membase announced a Cloudera partnership. I couldn’t detect anything technically exciting about that, but it serves to highlight what I do find to be an interesting usage trend. A couple of big Web players (AOL and ShareThis) are using Hadoop to crunch data and derive customer profile data, then feed that back into Membase. Why Membase? Because it can serve up the profile in a millisecond, as part of a bigger 40-millisecond-latency request.
And why Hadoop, rather than Aster Data nCluster, which ShareThis also uses? Umm, I didn’t ask.
When I mentioned this to Colin Mahony, he said Vertica had similar stories. However, I don’t recall whether they were about Membase or just memcached, and he hasn’t had a chance to get back to me with clarification. (Edit: As per Colin’s comment below, it’s both.)
Categories: Aster Data, Cache, Cloudera, Couchbase, Hadoop, memcached, Memory-centric data management, NoSQL, Pricing, Specific users, Vertica Systems, Web analytics | 7 Comments |
Where ParAccel is at
Until recently, I was extremely critical of ParAccel’s marketing. But there was an almost-clean sweep of the relevant ParAccel executives, and the specific worst practices I was calling out have for the most part been eliminated. So I was open to talking and working with ParAccel again, and that’s now happening. On my recent California trip, I chatted with three ParAccel folks for a few hours. Based on that and other conversation, here’s the current ParAccel story as I understand it.
Read more
Another medical records rant
I’ve previously ranted about the medical information problems in connection with my father’s care at Friendship Village of Dublin and Riverside Methodist Hospital (among others). Well, they’re getting worse. Read more
Categories: Health care | 1 Comment |
A few notes from XLDB 4
As much as I believe in the XLDB conferences, I only found time to go to (a big) part of one day of XLDB 4 myself. In general: Read more