Investment research and trading

Discussion of how data management and analytic technologies are used in trading and investment research. (As opposed to a discussion of the services we ourselves provide to investors.) Related subjects include:

CEP (Complex Event Processing)
(in Text Technologies) The use of text analytics in trading and investment research

February 5, 2010

The Sybase Aleri RAP

Well, I got a quick Sybase/Aleri briefing, along with multiple apologies for not being prebriefed. (Main excuse: News was getting out, which accelerated the announcement.) Nothing badly contradicted my prior post on the Sybase/Aleri deal.

To understand Sybase’s plans for Aleri and CEP, it helps to understand Sybase’s current CEP-oriented offering, Sybase RAP. So far as I can tell, Sybase RAP has to date only been sold in the form of Sybase RAP: The Trading Edition. In that guise, Sybase RAP has been sold to >40 outfits since its May, 2008 launch, mainly big names in the investment banking and stock exchange sectors. If I understood correctly, the next target market for Sybase RAP is telcos, for real-time network tuning and management.

In addition to any domain-specific applications, Sybase RAP has three layers:

CEP (Complex Event Processing). Sybase RAP CEP is based on a version of the Coral8 engine Sybase licensed and has been subsequently developing.
In-memory DBMS. Sybase’s IMDB is part of (but I guess separable from) and has the same API as Sybase’s OLTP DBMS Adaptive Server Enterprise (ASE, aka Sybase Classic).
Sybase IQ. Actually, Sybase used the phrase “based on Sybase IQ,” but I’m guessing it’s just Sybase IQ.

Categories: Aleri and Coral8, Analytic technologies, Data warehousing, In-memory DBMS, Investment research and trading, Market share and customer counts, Memory-centric data management, Streaming and complex event processing (CEP), Sybase

9 Comments

February 4, 2010

Quick thoughts on Sybase/Aleri

Sybase announced an asset purchase that amounts to a takeover of CEP (Complex Event Processing) Aleri. Perhaps not coincidentally, Sybase already had technology under the hood from Aleri predecessor/acquiree Coral8, for financial services uses (notwithstanding that between Aleri Classic and Coral8, Aleri Classic was the one of the two more focused on financial services). Quick reactions include:

The folks at Sybase still haven’t figured out when to prebrief me. (Edit: I’ve been briefed subsequently.)
Sybase/Aleri is a potentially powerful combination, if they can effectively address the point I just made about integrating disparate latencies. That said, I’m not expecting a lot, because the CEP industry always disappoints me.
Microsoft, IBM, and (somewhat less clearly) Oracle are all trying to do CEP inhouse. Sybase is making a good choice in having serious CEP inhouse itself
Surely the main focus and financial justification for the Sybase/Aleri acquisition is the financial services market.
Specifically, I expect the focus of technical integration between Aleri and Sybase’s DBMS products to start with Sybase IQ.
Coral8 had some interesting ideas about how to integrate CEP with OLTP/operational BI, but I’m not aware that they got much traction.
I bet there are use cases where Sybase tries and fails to sell Adaptive Server SQL Anywhere that CEP would be a better technical fit, but I don’t immediately see much practical business significance to that observation.
While this deal could easily strengthen the Vertica/StreamBase partnership, I don’t see any reason why it would lead those two companies to actually merge.

Related link

Thinking about analytic latency

Categories: Aleri and Coral8, Analytic technologies, Investment research and trading, Streaming and complex event processing (CEP), Sybase

7 Comments

January 17, 2010

Three broad categories of data

People often try to draw a distinction between:

Traditional data of the sort that’s stored in relational databases, aka “structured.”
Everything else, aka “unstructured” or “semi-structured” or “complex.”

There are plenty of problems with these formulations, not the least of which is that the supposedly “unstructured” data is the kind that actually tends to have interesting internal structures. But of the many reasons why these distinctions don’t tend to work very well, I think the most important one is that:

Databases shouldn’t be divided into just two categories. Even as a rough-cut approximation, they should be divided into three, namely:

Human/Tabular data –i.e., human-generated data that fits well into relational tables or arrays
Human/Nontabular data — i.e., all other data generated by humans
Machine-Generated data

Even that trichotomy is grossly oversimplified, for reasons such as:

These categories overlap.
There are kinds of data that get into fuzzy border zones.
Not all data in each category has all the same properties.

But at least as a starting point, I think this basic categorization has some value. Read more

Categories: Database diversity, Investment research and trading, Log analysis, Telecommunications, Web analytics

19 Comments

December 7, 2009

A framework for thinking about data warehouse growth

There are only three ways that the amount of data stored in data warehouses can grow:

The same kinds of data are stored as before, with more being added over time.
The same kinds of data are stored as before, but in more detail.
New kinds of data are stored.

Categories: Analytic technologies, Application areas, Data warehousing, Investment research and trading, Log analysis, Solid-state memory, Storage, Telecommunications, Text, Web analytics

9 Comments

November 23, 2009

Boston Big Data Summit keynote outline

Last month, Bob Zurek asked me to give a talk on “Big Data”, where “big” is anything from a few terabytes on up, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I’m posting them below.

Categories: Analytic technologies, Archiving and information preservation, Business intelligence, Cloud computing, Clustering, Columnar database management, Data warehouse appliances, Data warehousing, DBMS product categories, Humor, Investment research and trading, Log analysis, MapReduce, Market share and customer counts, NoSQL, OLTP, Open source, Parallelization, Presentations, Pricing, Solid-state memory, Storage, Telecommunications, Theory and architecture, Web analytics

6 Comments

October 18, 2009

General introduction to Splunk

I dropped by log analysis software vendor Splunk a few weeks ago for a chat with Marketing VP Steve Sommer (who some you may know from Cognos and/or Informix), Product Management VP Christina Noren, and above all co-founder/CTO Erik Swan. Splunk turns out to be a pretty interesting company, from both business and technical standpoints. For one thing, Splunk seems highly regarded by most people I mention it to.

Splunk’s technical stories include:

Text search over log files.
Business intelligence over text search. (That part sounds a lot like Attivio.)
MapReduce with schema flexibility and smart multi-stage execution plans. (That part sounds a lot like Aster Data.)

Infobright notes

I had lunch w/ Bob Zurek and Susan Davis of Infobright today. This wasn’t primarily a briefing, but a few takeaways are:

Infobright now has >100 paying customers.
Typical database size is from the low 100s of gigabytes to the low single-digit number of terabytes.
Agile development is at or approaching two-week release cycles.
Like Kickfire, Infobright has a multi-year deal with MySQL that insulates it against many potential Oracle/MySQL shenanigans.
From an industry perspective, Infobright’s customer base sounds a lot like other vendors’:
- Data mart outsourcing/online analytics
- Log files for websites
- Telecommunications
- Financial services
- OEM, especially in the markets cited above
- “Hey, we’re beginning to see the occasional energy deal”
- A few random others
Infobright is seeing some household-name customers, who surely have big-name analytic DBMS products, but who also have a policy that open source is the default choice, and if open source can get the job done then the favorite closed-source choices aren’t used.
Infobright has the usual open-source community story — lots of involvement and engagement in the forums, but contributions are limited mainly to connectivity, utility scripts, etc. (Maybe some national language translation too; I’m not sure.)

Categories: Analytic technologies, Data mart outsourcing, Data warehousing, Infobright, Investment research and trading, Kickfire, Log analysis, Market share and customer counts, MySQL, Open source, Telecommunications, Web analytics

7 Comments

October 10, 2009

How 30+ enterprises are using Hadoop

MapReduce is definitely gaining traction, especially but by no means only in the form of Hadoop. In the aftermath of Hadoop World, Jeff Hammerbacher of Cloudera walked me quickly through 25 customers he pulled from Cloudera’s files. Facts and metrics ranged widely, of course:

Some are in heavy production with Hadoop, and closely engaged with Cloudera. Others are active Hadoop users but are very secretive. Yet others signed up for initial Hadoop training last week.
Some have Hadoop clusters in the thousands of nodes. Many have Hadoop clusters in the 50-100 node range. Others are just prototyping Hadoop use. And one seems to be “OEMing” a small Hadoop cluster in each piece of equipment sold.
Many export data from Hadoop to a relational DBMS; many others just leave it in HDFS (Hadoop Distributed File System), e.g. with Hive as the query language, or in exactly one case Jaql.
Some are household names, in web businesses or otherwise. Others seem to be pretty obscure.
Industries include financial services, telecom (Asia only, and quite new), bioinformatics (and other research), intelligence, and lots of web and/or advertising/media.
Application areas mentioned — and these overlap in some cases — include:
- Log and/or clickstream analysis of various kinds
- Marketing analytics
- Machine learning and/or sophisticated data mining
- Image processing
- Processing of XML messages
- Web crawling and/or text processing
- General archiving, including of relational/tabular data, e.g. for compliance

Categories: Application areas, Aster Data, Cloudera, Data types, Data warehousing, Database diversity, EAI, EII, ETL, ELT, ETLT, Hadoop, Investment research and trading, Log analysis, MapReduce, Open source, Parallelization, Predictive modeling and advanced analytics, Scientific research, Structured documents, Telecommunications, Text, Vertica Systems, Web analytics

9 Comments

August 25, 2009

Sybase IQ business notes

As specialized analytic DBMS go, Sybase is near the top of the charts both in age (Sybase IQ was first introduced in the mid 1990s) and adoption. That’s even more true, of course, if we restrict the discussion strictly to columnar DBMS, aka column stores. Basic Sybase IQ adoption claims include:

>1500 users
>3000 installations (Sybase has variously cited 2.1 and 2.5+ as the installation/user ratio)
At least ~50-60 installations with >5 terabytes of user data

Note that 98% of Sybase IQ installations are under 5 terabytes; the heart of Sybase IQ’s business is the sub-terabyte data warehouse market.* Read more

Categories: Analytic technologies, Data mart outsourcing, Data warehousing, Investment research and trading, Sybase

3 Comments

August 4, 2009

Vertica’s version of MapReduce integration

I talked with Omer Trajman of Vertica Monday night about Vertica’s MapReduce integration, part of its Vertica 3.5 release. Highlights included:

By “integrating Vertica and MapReduce,” Vertica means “integrating Vertica and Hadoop.”
Vertica’s Hadoop integration is based on Cloudera’s DBInputFormat.
Omer called out for me several features of Vertica’s Hadoop integration that didn’t just come from Cloudera, namely:
- Cloudera’s DBInputFormat assumes the database runs on a single computer, or a single head node of an MPP system. Vertica’s technology, however, runs on peer parallel nodes with no head, and so Vertica adapted the DBInputFormat technology accordingly.
- Vertica lets you push down Map functions to the database. Omer reports a roughly even division among users and prospects between those who want to do this and ones who don’t.
- Vertica lets you do Reduce functions (or Map functions, if you don’t push them down to the database) on a separate cluster than you run the database software. Vertica asserts that its customers and prospects all want to do this. Right here is the big difference between Vertica’s MapReduce integration and Aster’s or Greenplum’s. (Aster would also say that Vertica’s weaker MapReduce/SQL programming integration is a big difference as well.)
- Indeed, Vertica lets you Reduce into a different DBMS than Vertica, if you choose.
- Vertica gives you flexibility on the size of the Map and Reduce clusters. Omer agreed with me when I said there were some limits on how fast one can add or subtract nodes in a Vertica grid, because there’s data redistribution involved. But one can add/change/delete Hadoop clusters extremely quickly.

Apparently, the use cases for Vertica/Hadoop integration to date lie in algorithmic trading and two kinds of web analytics. Specifically: Read more

Categories: Analytic technologies, Cloudera, Columnar database management, Data warehousing, Hadoop, Investment research and trading, MapReduce, Parallelization, Theory and architecture, VectorWise, Vertica Systems, Web analytics

5 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Investment research and trading

The Sybase Aleri RAP

Quick thoughts on Sybase/Aleri

Three broad categories of data

A framework for thinking about data warehouse growth

Boston Big Data Summit keynote outline

General introduction to Splunk

Infobright notes

How 30+ enterprises are using Hadoop

Sybase IQ business notes

Vertica’s version of MapReduce integration

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin