January 30, 2015

Growth in machine-generated data

In one of my favorite posts, namely When I am a VC Overlord, I wrote:

I will not fund any entrepreneur who mentions “market projections” in other than ironic terms. Nobody who talks of market projections with a straight face should be trusted.

Even so, I got talked today into putting on the record a prediction that machine-generated data will grow at more than 40% for a while.

My reasons for this opinion are little more than:

I was referring to the creation of such data, but the growth rates of new creation and of persistent storage are likely, at least at this back-of-the-envelope level, to be similar.

Anecdotal evidence actually suggests 50-60%+ growth rates, so >40% seemed like a responsible claim.

Related links

Comments

7 Responses to “Growth in machine-generated data”

  1. David Gruzman on February 1st, 2015 9:45 am

    I am wondering, if all this data is valuable enough to be stored? For example it might be hard to justify storage of temperature sensors data in one minute resolution for more then a few weeks weeks.
    In other words – I am not sure that growing of amount of produced data will be reflected into growing amount of stored and analyzed data.

  2. Curt Monash on February 1st, 2015 5:58 pm

    David,

    It is very unlikely to all be stored. We couldn’t pay for storing it all today. As storage gets cheaper (Moore’s Law/Kryder’s Law), volumes will increase further (Moore’s Law/subject of this post). So if we can’t afford to keep everything now, we also won’t be able to afford doing so in the future.

    That said, 1 minute temperature readings aren’t the best example, because those don’t really take a lot of volume.

  3. Peter Fretty on February 2nd, 2015 12:25 pm

    I wouldn’t doubt the 40 percent growth rate considering the number of connected machines now generating data that was not previously collected. I agree with David that if the data isn’t store for future analysis, how are you leveraing value in the data? Is it immediately analyzed? Are summaries created and stored from collected data?

    Peter Fretty, IDG blogger posting on behalf of SAS

  4. joseph on February 2nd, 2015 8:01 pm

    One factor is affordability. Storage gets cheaper every year and we need to find a way to utilize them

  5. Curt Monash on February 3rd, 2015 11:20 am

    Peter,

    If the data isn’t all being stored, then summaries, highlights and/or samples surely should be.

    Event detection is one term I’ve heard used in that connection. Another is data reduction, which is a different sense of the term than “choose the most useful variables on which to base a predictive model”.

  6. David Gruzman on February 3rd, 2015 12:34 pm

    It would be nice to be able to define “value per GB” measure for different types of data. Having graph of such value together with storage price graph would enable us to predict – what types of data will be stored in the future.

  7. BI for NoSQL — some very early comments | DBMS 2 : DataBase Management System Services on March 17th, 2015 7:40 am

    […] have more data — presumably machine-generated — than you can afford to […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.