January 28, 2009
More Oracle notes
When I went to Oracle in October, the main purpose of the visit was to discuss Exadata. And so my initial post based on the visit was focused accordingly. But there were a number of other interesting points I’ve never gotten around to writing up. Let me now remedy that, at least in part.
- Oracle has offered compression since 9i Release 2. It’s token/dictionary on columns or groups of columns. Oracle says 2-3X compression is common and 8X not unheard of. Also, Oracle has always had variable-length rows, which can make its data somewhat more compact to start with than some other vendors’.
- Interestingly, Oracle executes SQL statements directly on compressed data.
- In general, Oracle says its compression has very little overhead.
- Oracle has offered transparent encryption since 10g Release 2, on whole columns or individual tables. I didn’t ask about encryption performance.
- In 11g Release 1, Oracle rearchitected its LOB (Large OBjects) structure for performance, and perhaps functionality as well. Andy Mendelsohn believes performance now is the same as that of raw file systems. Supported features include encryption, compression, and deduplication. Applications for this new functionality include content management such as video or images and — which I found surprising — spatial data. But it’s not so relevant to text and OLAP data, even though technically those are stored in LOBs as well.
- Actually, Oracle stores OLAP in a true MOLAP array. (I recall that that integration took years.)
- Speaking of Oracle’s geospatial functionality, it’s used heavily in ERP. (I didn’t probe for details.) Oracle also says “all” the mapping vendors are Oracle customers (partners?).
- In Ray Roccaforte’s view, the two greatest drivers of data warehouse growth are consolidation (merger-related or otherwise) and web data.
- Oracle believes that, at least among general-purpose DBMS, its main product has the only bitmaps actually stored on disk, and by far the best star optimizations.
- While conceding that table scans are important, and that Oracle had bottlenecks in that regard Exadata is designed to fix, Ray insists there are some applications for which star optimizations really are needed.
- Query pipelining is limited in Oracle, and the optimizer is not geared to optimize streams of queries.
- However, Oracle’s optimizer is sufficiently self-aware to notice when a query runs long and try to do things differently “next time.” For example, it might do more sampling if the statistics proved unreliable, or might take the time to search a bigger solution space.
- Oracle says it already does a lot of CEP-like things to refresh overlapping materialized views simultaneously. (That makes sense.) Oracle is working on truer complex event processing.
- Complexity data point: Oracle’s and SAP’s application suites each have >100,000 tables and >1,000,000 distinct SQL statements.
There were a couple of notes about text analytics as well, but I’ll blog about those separately.
Categories: Data types, Data warehousing, Database compression, GIS and geospatial, MOLAP, Oracle, SAP AG, Streaming and complex event processing (CEP), Theory and architecture, Web analytics
Subscribe to our complete feed!
Comments
9 Responses to “More Oracle notes”
Leave a Reply
I recently saw Exadata in action and also attended a technical session on Exadata. I can validate a couple of things.
Compression – In 11g compression is greatly enhanced and works very differently from prior versions. For one the compression takes place in the background and according to Oracle the overall impact on a very active system is about 3%. The background process (if it runs all the time because you are actively inserting/updating data) can have a 3% impact. Writes see no impact because nothing is compressed when data is written – it is compressed in the background later. Reads see improved performance because they have to scan fewer blocks and more blocks fit in memory. Plus the compression algorithm can compress data across columns. The same algorithm is used for compressing backup files. CCompression can be enabled at the tablespace level, table level, or even a partition level.
Exadata – From what I have seen Exadata has massive I/O capabilities. Each database machine (8 DBnodes + 14 Exadata storage servers) can sustain 10-14GB I/O per second. As a result you don’t need many indices (which Oracle traditionally required to get decent performance). If you do massive sequential I/O the need for random I/O goes down. I saw a few queries on multi terabyte tables come back in less than 5 minutes which is unheard of in regular Oracle environments. The thing with Oracle is that they have excellent functionality (terrific PL/SQL engine, many SQL extensions, truly great concurrency model – a huge weakness of Teradata, many partitioning options, excellent compression in 11g, many high availability options etc.) and with the new I/O capabilities of Exadata – they have a winner in the high end DW area. A lot of Oracle shops that were forced to go to Teradata now have a real alternative.
Sanjay,
This was all demo, right? No production use?
Thanks for sharing!
CAM
Could you please clarify this statement:
However, Oracle’s optimizer is sufficiently self-aware to notice when a query runs long and try to do things differently “next time.” For example, it might do more sampling if the statistics proved unreliable, or might take the time to search a bigger solution space.
What is particular feature you are talking about? IMO Oracle’s CBO can not gather or regather stats on the fly unless user requests CBO do it.
I’m sorry, Timur, but I was taking Oracle’s word for that one. It didn’t seem like a terribly big deal, so I didn’t push for clarity as to the exact amount of DBA intervention required.
Perhaps somebody from Oracle can comment more precisely …
CAM
Curt,
Yes this was a demo and not a production database. I did speak to a couple of customers (that either did a POC or are in the process) who were at the demo and they confirmed some of the things I heard at the demo.
As to the question from Timur – Yes Oracle can perform dynamic sampling of statistics through the use of a parameter called OPTIMIZER_DYNAMIC_SAMPLING. Zero indicates no dynamic sampling and 10 is for very aggressive dynamic sampling. Oracle can do dynamic sampling if stats are missing, out of date, or may lead to bad estimations. Of course dynamic sampling is only useful if the amount of time to sample is small compared to the query execution time.
Through dynamic sampling the optimizer may come up with a different plan.
Sanjay,
I know about dynamic sampling. I mentioned that “CBO can not gather or regather stats on the fly unless user requests CBO to do it.” But
dunamic_smpling is an instruction to CBO, so it is not the same thing as in quote. That is my point: it is not a self-awareness, just a user request.
It may just be semantics but I believe it is self awareness. Once you tell the optimizer to be self aware (ONE time at the instance level not per user or per query) to use sampling the optimizer makes the decision on when to sample from that point on until you turn that off again.
So while I agree that you have to inform the optimizer but you do that ONCE.
Sanjay,
I think it’s a kind of perception… In my view it’s just a configuration issue.
And BTW dynamic sampling (when it is set) is always applied by CBO. It has some rules to follow (you can find them in docs). There are no such a condition that “when *a query runs long* and try to do things differently “next time.””. So, I don’t think that dynamic sampling is the feature which fits to this definition.
[…] reasons, it was a lot more work than one might at first guess. One such reason is that it spawned four related blog posts over the past few […]