In-database analytics — analytic glossary draft entry
This is a draft entry for the DBMS2 analytic glossary. Please comment with any ideas you have for its improvement!
Note: Words and phrases in italics will be linked to other entries when the glossary is complete.
“In-database analytics” is a catch-all term for analytic capabilities, beyond standard SQL, running on the same machine as and under the management of an analytic DBMS. These can run in one or both of two modes:
- In-process or unfenced, i.e. in the same process as the DBMS itself. This option gives maximum performance, but any defects in the analytic code may crash the whole DBMS. Also, it generally requires that the code be in the same language as the DBMS, i.e. C++.
- Out-of-process or fenced, i.e. in a separate process. This option sacrifices performance, in favor of reliability and language flexibility.
In-database analytics may offer great performance and scalability advantages versus the alternative of extracting data and having it be processed on a separate server. This is particularly likely to be the case in MPP (Massively Parallel Processing) analytic DBMS environments.
Examples of in-database analytics include:
- Creating temporary data structures that persist past the life of a query.
- Creating temporary data structures that are non-tabular.
- Predictive modeling that uses all the same nodes in an MPP cluster where the data resides.
- Predictive analytics (scoring only).
Other common domains for in-database analytics include sessionization, time series analysis, and relationship analytics.
Notable products offering in-database analytics include:
- Teradata Aster SQL/MR.
- Multiple other analytic platforms, such as Sybase IQ, Vertica, or IBM Netezza. Indeed, in-database analytics are a defining feature of analytic platforms.
- Fuzzy Logix (for predictive analytics).
Comments
8 Responses to “In-database analytics — analytic glossary draft entry”
Leave a Reply
[…] our usage, an “analytic platform” is an analytic DBMS with well-integrated in-database analytics, or a data warehouse appliance that includes one. The term is also sometimes used to refer […]
I think that “in-database” means in the database instance.
It might be worth pointing out that in-database analytics are often deployed as SQL extensions, user-defined functions, stored procedures, etc.
I also wonder if, for a shared-nothing DBMS, in-database analytics implies a parallel implementation (not a single process controlled by the instance that looks like another tier even though it is in the instance)?
Rob,
I’ve never sure of what “in the same instance” means, so I’m trying to duck that one.
I’m also not sure what you mean by your second paragraph, truth be told.
As for your third paragraph — I’m really happy that it’s here in the comment thread should somebody click back to see it! I also might make that point in a blog post some time, which I’d then link from the definition. But I’m guessing I I won’t put it into the glossary entry itself.
You seem to have left both Oracle and SQL server out. On the ‘in-process’ side, Oracle offers Java and Pl/SQL as options for table functions (and UDF’s and UDAF’s) since the VM runs in process. SQL Server offers C# (and maybe other languages that compile to the CLR ) though I dont know if they support parallel table functions now. Since the code runs against the VM defects dont crash the DB and there are no security concerns. Performance is better than out of process but not (yet) as good as a native function call.
Thanks.
Timely, too. I’m due next week for my first SQL Server briefing in a very long time.
[…] of in-database analytic processes. If you’re going to do in-database analytics, fenced/out-of-process ones are a lot safer than the […]
[…] rules or guidance regarding “analytical problems, situations, or techniques better suited for in-database versus in-memory processing”. There are actually two kinds of distinction to be […]
[…] talked with Teradata about a bunch of stuff yesterday, including this week’s announcements in in-database predictive modeling. The specific news was about partnerships with Fuzzy Logix and Revolution Analytics. But what I […]