Groupon-related thoughts on the future of advertising and e-commerce
There’s been a lot of debate about Groupon around its initial public offering, and I find the Groupon bears to be more persuasive than the Groupon bulls. That said, there’s a Groupon-optimism argument I want to share at length, by Steve Cheney, because it outlines some possibilities for the continued evolution of analytics. Read more
Categories: Analytic technologies, Specific users | 1 Comment |
Hardware for Hadoop
After suggesting that there’s little point to Hadoop appliances, it occurred to me to look into what kinds of hardware actually are used with Hadoop. So far as I can tell:
- Hadoop nodes today tend to run on fairly standard boxes.
- Hadoop nodes in the past have tended to run on boxes that were light with respect to RAM.
- The number of spindles per core on Hadoop node boxes is going up even as disks get bigger.
The essence of an application
Once upon a time, information technology was strictly about — well, information. And by “information” what was meant was “data”.* An application boiled down to a database design, plus a straightforward user interface, in whatever the best UI technology of the day happened to be. Things rarely worked quite as smoothly as the design-database/press-button/generate-UI propaganda would have one believe, but database design was clearly at the center of application invention.
*Not coincidentally, two of the oldest names for “IT” were data processing and management information systems.
Eventually, there came to be three views of the essence of IT:
- Data — i.e., the traditional view, still exemplified by IBM and Oracle.
- People empowerment — i.e., Microsoft-style emphasis on UI friendliness and efficiency.
- Operational workflow — i.e., SAP-style emphasis on actual business processes.
Graphical user interfaces were a major enabling technology for that evolution. Equally important, relational databases made some difficult problems easy(ier), freeing application designers to pursue more advanced functionality.
Based on further technical evolution, specifically in analytic and consumer technologies, I think we should now take that list up to five. The new members I propose are:
- Investigative analytics.
- Emotional response.
Categories: Data warehousing, Facebook, Predictive modeling and advanced analytics, Theory and architecture, Web analytics | 1 Comment |
Notes from the Fusion-io S-1 filing
Fusion-io has filed for an initial public offering. With public offerings go S-1 filings which, along with 10-Ks, are the kinds of SEC filing that typically contain a few nuggets of business information. Notes from Fusion-io’s S-1 include:
Fusion-io is growing very, very fast, doubling or better in revenue every 6 months.
Fusion-io’s marketing message revolves around “data centralization”. Fusion-io is competing against storage-area networks and storage arrays.
Fusion-io’s list of application types includes
… systems dedicated to decision support, high performance financial analysis, web search, content delivery and enterprise resource planning.
Fusion-io says it has shipped over 20 petabytes of storage.
Fusion-io has a shifting array of big customers, including OEMs: Read more
Categories: Analytic technologies, Data warehousing, Facebook, Solid-state memory, Storage | Leave a Comment |
Introduction to Syncsort and DMExpress
Let’s start with some Syncsort basics.
- Syncsort was founded in 1968.
- As you might guess from its name and age, Syncsort started out selling software for IBM mainframes, used for sorting data. However, for the past 30 or so years, Syncsort’s products have gone beyond sort to also do join, aggregation, and merge. This was the basis for Syncsort’s expansion into the more general ETL (Extract/Transform/Load) business.
- As you might further guess, along the way there was a port to UNIX, development of a GUI (Graphical User Interface), and a change of ownership as Syncsort’s founder more or less cashed out.
- At this point, Syncsort sees itself primarily as a data integration/ETL company, whose main claim to fame is performance, with further claims of linear scaling and no manual tuning.*
One of Syncsort’s favorite value propositions is to contrast the cost of doing ETL in Syncsort, on commodity hardware, to the cost of doing ELT (Extract/Load/Transform) on high-end Teradata gear.
Categories: Data integration and middleware, Database compression, EAI, EII, ETL, ELT, ETLT, Specific users, Syncsort | 9 Comments |
Teradata, Aster Data, and Teradata/Aster
Teradata is acquiring Aster Data. Naturally, the deal is being presented with a Treaty of Tordesillas kind of positioning — Teradata does X, Aster Data does Y, and everybody looks forward to having X and Y in the same product portfolio. That said, my initial positioning and product strategy thoughts on the Teradata/Aster combination go something like this. Read more
Categories: Analytic technologies, Aster Data, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, RDF and graphs, Specific users, Teradata | 9 Comments |
Cassandra company DataStax (formerly Riptano) is on track
Riptano, the Cassandra company, has changed its name to DataStax. DataStax has opened headquarters in Burlingame and hired some database-experienced folks – notably Ben Werther from Greenplum and Michael Weir from ParAccel, with Zenobia Godschalk (who worked with Aster Data) somewhere in the outside PR mix. Other than that, what’s new at DataStax is pretty much what could have been expected based on what DataStax folks said last spring.
Most notably, DataStax is introducing a software offering, whose full name is DataStax OpsCenter for Apache Cassandra. DataStax OpsCenter for Apache Cassandra seems to be, in essence, a monitoring tool for Cassandra clusters, with a bit of capacity planning bundled in. (If there are any outright operations parts to DataStax OpsCenter, they got overlooked in our conversation.)* Read more
Categories: Cassandra, DataStax, Market share and customer counts, NoSQL, Specific users, Telecommunications | 1 Comment |
The technology of privacy threats
This post is the second of a series. The first one was an overview of privacy dangers, replete with specific examples of kinds of data that are stored for good reasons, but can also be repurposed for more questionable uses. More on this subject may be found in my August, 2010 post Big Data is Watching You!
There are two technology trends driving electronic privacy threats. Taken together, these trends raise scenarios such as the following:
- Your web surfing behavior indicates you’re a sports car buff, and you further like to look at pictures of scantily-clad young women. A number of your Facebook friends are single women. As a result, you’re deemed a risk to have a mid-life crisis and divorce your wife, thus increasing the interest rate you have to pay when refinancing your house.
- Your cell phone GPS indicates that you drive everywhere, instead of walking. There is no evidence of you pursuing fitness activities, but forum posting activity suggests you’re highly interested in several TV series. Your credit card bills show that your taste in restaurant food tends to the fatty. Your online photos make you look fairly obese, and a couple have ashtrays in them. As a result, you’re judged a high risk of heart attack, and your medical insurance rates are jacked up accordingly.
- You did actually have that mid-life crisis and get divorced. At the child-custody hearing, your ex-spouse’s lawyer quotes a study showing that football-loving upper income Republicans are 27% more likely to beat their children than yoga-class-attending moderate Democrats, and the probability goes up another 8% if they ever bought a jersey featuring a defensive lineman. What’s more, several of the more influential people in your network of friends also fit angry-male patterns, taking the probability of abuse up another 13%. Because of the sound statistics behind such analyses, the judge listens.
Not all these stories are quite possible today, but they aren’t far off either.
Categories: Facebook, Predictive modeling and advanced analytics, Surveillance and privacy, Telecommunications, Web analytics | 4 Comments |
Privacy dangers — an overview
This post is the first of a series. The second one delves into the technology behind the most serious electronic privacy threats.
The privacy discussion has gotten more active, and more complicated as well. A year ago, I still struggled to get people to pay attention to privacy concerns at all, at least in the United States, with my first public breakthrough coming at the end of January. But much has changed since then.
On the commercial side, Facebook modified its privacy policies, garnering great press attention and an intense user backlash, leading to a quick partial retreat. The Wall Street Journal then launched a long series of articles — 13 so far — recounting multiple kinds of privacy threats. Other media joined in, from Forbes to CNet. Various forms of US government rule-making to inhibit advertising-related tracking have been proposed as an apparent result.
In the US, the government had a lively year as well. The Transportation Security Administration (TSA) rolled out what have been dubbed “porn scanners,” and backed them up with “enhanced patdowns.” For somebody who is, for example, female, young, a sex abuse survivor, and/or a follower of certain religions, those can be highly unpleasant, if not traumatic. Meanwhile, the Wikileaks/Cablegate events have spawned a government reaction whose scope is only beginning to be seen. A couple of “highlights” so far are some very nasty laptop seizures, and the recent demand for information on over 600,000 Twitter accounts. (Christopher Soghoian provided a detailed, nuanced legal analysis of same.)
At this point, it’s fair to say there are at least six different kinds of legitimate privacy fear. Read more
Categories: Analytic technologies, Facebook, GIS and geospatial, Health care, Surveillance and privacy, Telecommunications, Web analytics | 6 Comments |
Notes and links October 22, 2010
A number of recent posts have had good comments. This time, I won’t call them out individually.
Evidently Mike Olson of Cloudera is still telling the machine-generated data story, exactly as he should be. The Information Arbitrage/IA Ventures folks said something similar, focusing specifically on “sensor data” …
… and, even better, went on to say: Read more