August 6, 2012

Notes, links and comments August 6, 2012

I haven’t done a notes/link/comments post for a while. Time for a little catch-up.

1. MySQL now has a memcached integration story. I haven’t checked the details. The MySQL team is pretty hard to talk with, due to the heavy-handedness of Oracle’s analyst relations.

2. The Large Hadron Collider offers some serious numbers, including:

1 petabyte/second.
6 x 10⁹ collisions/second.
Only 1 in 10¹³collision records kept (which I guess knocks things down to a 100 byte/second average, from the standpoint of persistent storage).
Real-time filtering by a cluster of several thousand machines, over a 25 nanosecond period.

3. One application area we don’t talk about much for analytic technologies is education. However:

Knewton vigorously talks up the idea of online learning that adapts to the students’ previous responses, complete with the “Big Data” buzzword.
Knewton evidently likes graphs, and seems to be eagerly awaiting scale-out capabilities in Neo4j.
The New York Times offered a survey article about analytics in education. It seemed to be focused on Arizona State University — where I attended the only educational software conference I’ve ever gone to, in approximately 1984. One concerning aspect: There didn’t seem to be any reason to be sure the outcomes they were working toward had much to do with an actually better education.

So how soon will budgets emerge for all this, especially in the United States? I’m not sure.

Education has all sorts of problems at both at the grade-school and collegiate levels, including bureaucratic weirdness and huge financial pressures.
Textbook publisher Macmillan is investing significant capital in education technology businesses … but diversifications of that kind have often gone wrong before.

4. Recent posts with robust comment threads — and this is a very partial list — include:

Pros and cons of Microsoft SQL Server were explored after I opined about SQL Server to MySQL migration.
There was a lot of commentary on my May series of graph analysis and management posts.
Later, Neo’s Philip Rathle added clarifying detail to my post on Neo Technology and Neo4j.
My June series on Hadoop drew numerous comments and clarifications too.
There was vigorous response when I suggested in May that “Big Data” might be overhyped …
… but nothing like what transpired when I said something similar in September, 2011.

5. Finally — and thoroughly superseding my post on disk, flash, and RAM — I saw an awesome round-up of latency numbers, which I’ll just quote below:

L1 cache reference …………………………………………………… 0.5 ns
Branch mispredict ……………………………………………………….. 5 ns
L2 cache reference ………………………………………………………. 7 ns
Mutex lock/unlock ………………………………………………………. 25 ns
Main memory reference …………………………………………… 100 ns
Compress 1K bytes with Zippy ……………………………… 3,000 ns
Send 2K bytes over 1 Gbps network …………………… 20,000 ns
SSD random read ……………………………………………….. 150,000 ns
Read 1 MB sequentially from memory ……………….. 250,000 ns
Round trip within same datacenter …………………… 500,000 ns
Read 1 MB sequentially from SSD* ………………… 1,000,000 ns
Disk seek ……………………………………………………….. 10,000,000 ns
Read 1 MB sequentially from disk ……………….. 20,000,000 ns
Send packet CA -> Netherlands -> CA ……… 150,000,000 ns

Repeating that in different units, it’s:

    L1 cache reference ......................... 0.5 ns

    Branch mispredict ............................ 5 ns

    L2 cache reference ........................... 7 ns

    Mutex lock/unlock ........................... 25 ns

    Main memory reference ...................... 100 ns

    Compress 1K bytes with Zippy ................. 3 µs

    Send 2K bytes over 1 Gbps network ........... 20 µs

    SSD random read ............................ 150 µs

    Read 1 MB sequentially from memory ......... 250 µs

    Round trip within same datacenter .......... 0.5 ms

    Read 1 MB sequentially from SSD* ............. 1 ms

    Disk seek ................................... 10 ms

    Read 1 MB sequentially from disk ............ 20 ms

    Send packet CA ->  Netherlands -> CA ....... 150 ms

Categories: Cache, memcached, Memory-centric data management, MySQL, Open source, Petabyte-scale data management, RDF and graphs, Scientific research

Subscribe to our complete feed!

Comments

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Notes, links and comments August 6, 2012

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin