Logless, lockless Netezza more carefully explained
I talked at length with Bill Blake and Doug Johnson of Netezza today. (Bill is exactly the guy I complained of previously having had my access cut off to.) One takeaway was a clarification of their approach to transactions, which sounds even cooler than I first thought. It’s actually not a new idea; they just timestamp rows with CreateIDs and DeleteIDs, then exploit those to the hilt. Actually, it seems like this approach would be interesting in OTLP as well, although I’m not aware of it being used in any of the more successful OLTP DBMS systems. (Yes, this is an open invitation to fans of less-established DBMS products to tell me of their virtues, preferably in a flame-free manner.)
Here, as best I currently understand it, is how Netezza handles several of the functions one would expect from a transactional DBMS.
Concurrency, serializability, read repeatability, and locking. A query only returns rows that had been committed before the query began. A “visibility list” of updates in progress while the query is underway is maintained so that any contending results can be filtered out (I presume this is a performance feature). There’s no locking code whatsoever.
I forgot to ask what happens when two transactions try to update the same row, but I presume it’s a standard two-phase commit, and that there’s no need to get particularly good performance in such cases because the situation almost never arises.
Rollback, rollforward, logging. There’s no log. Rollback happens by slapping DeleteIDs on any row that has too recent of a CreateID.
Point-in-time snapshotting. We didn’t talk about the use of time stamps for point-in-time snapshotting (e.g., for compliance), but the possibilities are obvious.
All this sounds like great stuff. So where’s the catch? Well, you obviously have to put two columns into each table for the timestamp IDs. But while those columns have to be stored on disk, they rarely will be a burden on the microprocessors or data movement subsystems, since the FPGA will filter them out as soon as the get off of disk. And as a pure storage burden, the timestamps are much less of a deal than logs would be.
As for any direct downside to not keeping logs – besides their use in rollback/rollforward and the like, logs are good for replication and as an added copy of the data for recovery purposes. I don’t see why either factor would be a big deal for most Netezza customers.
Comments
5 Responses to “Logless, lockless Netezza more carefully explained”
Leave a Reply
Curt,
If I understand this right, there’s no real way to have multiple ‘transactions’ at the same time, which would seem to be a significant limitation if multiple users want to do updates etc. at the same time (which is remarkably common, even in DW systems).
Also, does this mean that an updated row effectively changes position on the disk? If so, how does this affect Netezza’s zone maps? Zone maps only work well in systems that load data in strict date order (thereby providing a kind of date partitioning). If the row order is changed later, the performance benefits of zone maps will degrade over time.
Stuart
DATAllegro
[…] Timestamps are used for inserts and deletes; otherwise, there are no data changes. (Without that kind of approach, the update strategy in Point #2 couldn’t be viable.) A big benefit to these timestamps is that you can assure integrity via “snapshot isolation”; i.e., by a virtual rollback to a recent point in time. Thus, Vertica can get away without any kind of locks or, for that matter, transaction/redo logs. Row-oriented Netezza uses a similar logless, lockless approach. […]
“A query only returns rows that had been committed before the query began.”
So if transaction A updates a counter (from 0 to 1), and sooner than it manages to commit, transaction B (which started after A) updates the counter again (from 0 to 1), the counter is still 1. Great! But in true ACID the counter would be 2.
[…] which for now seems to mean recognizing which rows are and aren’t valid under Netezza’s form of MVCC (MultiVersion Concurrency […]
re: asdf
This comment sat for months without the obvious reply: You’re ignoring commits. Oversimplifying: While the two transactions are both active, they will each think they incremented the counter from 0 to 1. There’s no problem with that; *uncommitted* transactions aren’t serializable and don’t need to maintain global consistency. When it comes time to commit, the rules is: You can only *commit* a value if the timestamp on it matches the timestamp it had when you read it. So the first transaction will succeed in committing, while the second will abort. ACID is maintained.
The logic here is that a transaction acts as if it’s working on a snapshot of the database as it existed at the time the transaction began. It can neither read nor write anything that is more recent than the snapshot. Attempted reading can be handled by looking at history; attempted writing – “changing history” – forces an abort.
Three things to consider:
– The work done by the aborted transaction is lost – as is the case with *any* aborted transaction. A clash of timestamps like this is the analogue of a deadlock in a lock-based system. MVCC systems are called “optimistic” because they assume such clashes will be rare enough not to matter – just as a lock-based system assumes deadlocks are rare. Both systems remain correct if their assumptions are violated, but performance disintegrates.
– There are optimizations to avoid wasting too much work. If read the counter’s timestamp “with intent to lock”, we can mark the counter to say that, and a second “read with intent to lock” can abort right away. Or the system can abort the new of the two transactions, or the one that has fewer pending writes, or whatever. Whether these approaches help or hurt has to be determined.
– There are more subtle interactions: Transaction A reads and increments counter; transaction B reads counter and writes it to counter2. If A commits before B does, B won’t abort, but we lose global serialization with respect to the timestamps (and if you have another counter and another transaction, there may be no consistent serial order at all). Fully developed MVCC systems have to deal with this kind of thing – or you can decide that the really strong “global semantic consistency” being violated here is more than you care about. (You don’t always get it with locking systems either.)