Notes, links and comments August 6, 2012
I haven’t done a notes/link/comments post for a while. Time for a little catch-up.
1. MySQL now has a memcached integration story. I haven’t checked the details. The MySQL team is pretty hard to talk with, due to the heavy-handedness of Oracle’s analyst relations.
2. The Large Hadron Collider offers some serious numbers, including:
- 1 petabyte/second.
- 6 x 109 collisions/second.
- Only 1 in 1013 collision records kept (which I guess knocks things down to a 100 byte/second average, from the standpoint of persistent storage).
- Real-time filtering by a cluster of several thousand machines, over a 25 nanosecond period.
3. One application area we don’t talk about much for analytic technologies is education. However:
- Knewton vigorously talks up the idea of online learning that adapts to the students’ previous responses, complete with the “Big Data” buzzword.
- Knewton evidently likes graphs, and seems to be eagerly awaiting scale-out capabilities in Neo4j.
- The New York Times offered a survey article about analytics in education. It seemed to be focused on Arizona State University — where I attended the only educational software conference I’ve ever gone to, in approximately 1984. One concerning aspect: There didn’t seem to be any reason to be sure the outcomes they were working toward had much to do with an actually better education.
So how soon will budgets emerge for all this, especially in the United States? I’m not sure.
- Education has all sorts of problems at both at the grade-school and collegiate levels, including bureaucratic weirdness and huge financial pressures.
- Textbook publisher Macmillan is investing significant capital in education technology businesses … but diversifications of that kind have often gone wrong before.
4. Recent posts with robust comment threads — and this is a very partial list — include:
- Pros and cons of Microsoft SQL Server were explored after I opined about SQL Server to MySQL migration.
- There was a lot of commentary on my May series of graph analysis and management posts.
- Later, Neo’s Philip Rathle added clarifying detail to my post on Neo Technology and Neo4j.
- My June series on Hadoop drew numerous comments and clarifications too.
- There was vigorous response when I suggested in May that “Big Data” might be overhyped …
- … but nothing like what transpired when I said something similar in September, 2011.
5. Finally — and thoroughly superseding my post on disk, flash, and RAM — I saw an awesome round-up of latency numbers, which I’ll just quote below:
L1 cache reference …………………………………………………… 0.5 ns
Branch mispredict ……………………………………………………….. 5 ns
L2 cache reference ………………………………………………………. 7 ns
Mutex lock/unlock ………………………………………………………. 25 ns
Main memory reference …………………………………………… 100 ns
Compress 1K bytes with Zippy ……………………………… 3,000 ns
Send 2K bytes over 1 Gbps network …………………… 20,000 ns
SSD random read ……………………………………………….. 150,000 ns
Read 1 MB sequentially from memory ……………….. 250,000 ns
Round trip within same datacenter …………………… 500,000 ns
Read 1 MB sequentially from SSD* ………………… 1,000,000 ns
Disk seek ……………………………………………………….. 10,000,000 ns
Read 1 MB sequentially from disk ……………….. 20,000,000 ns
Send packet CA -> Netherlands -> CA ……… 150,000,000 ns
Repeating that in different units, it’s:
L1 cache reference ......................... 0.5 nsBranch mispredict ............................ 5 nsL2 cache reference ........................... 7 nsMutex lock/unlock ........................... 25 nsMain memory reference ...................... 100 nsCompress 1K bytes with Zippy ................. 3 µsSend 2K bytes over 1 Gbps network ........... 20 µsSSD random read ............................ 150 µsRead 1 MB sequentially from memory ......... 250 µsRound trip within same datacenter .......... 0.5 msRead 1 MB sequentially from SSD* ............. 1 msDisk seek ................................... 10 msRead 1 MB sequentially from disk ............ 20 msSend packet CA -> Netherlands -> CA ....... 150 ms
Comments
Leave a Reply