Investment research and trading

Discussion of how data management and analytic technologies are used in trading and investment research. (As opposed to a discussion of the services we ourselves provide to investors.) Related subjects include:

CEP (Complex Event Processing)
(in Text Technologies) The use of text analytics in trading and investment research

November 28, 2011

Agile predictive analytics — the “easy” parts

I’m hearing a lot these days about agile predictive analytics, albeit rarely in those exact terms. The general idea is unassailable, in that it boils down to using data as quickly as reasonably possible. But discussing particulars is hard, for several reasons:

Pundits tend to sketch castles in the air.
Vendors tend to confuse part of the story — generally the part they happen to offer 🙂 — with the whole.
Different use cases give rise to different kinds of issues.

At least three of the generic arguments for agility apply to predictive analytics:

Doing the correct thing soon is usually better than doing the same correct thing later.
If it doesn’t take much time to do something, hopefully it doesn’t take that much expense (labor and so on) either.
It’s hard to get new stuff completely right on the first try. Often, the best strategy is to come close fast, then fix what’s still not ideal.

But the reasons to want agile predictive analytics don’t stop there.

Categories: EAI, EII, ETL, ELT, ETLT, Investment research and trading, Predictive modeling and advanced analytics

15 Comments

November 28, 2011

Terminology: Data mustering

I find myself in need of a word or phrase that means bring data together from various sources so that it’s ready to be used, where the use can be analysis or operations. The first words I thought of were “aggregation” and “collection,” but they both have other meanings in IT. Even “data marshalling” has a specific meaning different from what I want. So instead, I’ll go with data mustering.

I mean for the term “data mustering” to encompass at least three scenarios:

Integrated (relational) data warehouse.
Big bit bucket.
Big bit stream.

Let me explain what I mean by each. Read more

Categories: Data warehousing, Investment research and trading, Streaming and complex event processing (CEP), Sybase, Teradata

12 Comments

November 21, 2011

Some big-vendor execution questions, and why they matter

When I drafted a list of key analytics-sector issues in honor of look-ahead season, the first item was “execution of various big vendors’ ambitious initiatives”. By “execute” I mean mainly:

“Deliver products that really meet customers’ desires and needs.”
“Successfully convince them that you’re doing so …”
“… at an attractive overall cost.”

Vendors mentioned here are Oracle, SAP, HP, and IBM. Anybody smaller got left out due to the length of this post. Among the bigger omissions were:

salesforce.com (multiple subjects).
SAS HPA.
The evolution of Hadoop.

Categories: Business intelligence, Cognos, Columnar database management, Data warehouse appliances, Data warehousing, Exadata, Hadoop, HP and Neoview, IBM and DB2, In-memory DBMS, Investment research and trading, Memory-centric data management, Netezza, NoSQL, Oracle, SAP AG, Vertica Systems

2 Comments

November 10, 2011

StreamBase catchup

While I was cryptic in my general CEP/streaming catchup, I’ll say a bit more regarding StreamBase in particular. At the highest level, non-technically:

StreamBase once planned to conquer the world.
However, StreamBase really only sold effectively in the financial trading and intelligence markets.
StreamBase retrenched, focusing almost exclusively on the financial trading market.
With StreamBase LiveView, StreamBase is expanding from embedded operational analytics to do (also operational) business intelligence as well.
StreamBase is hopeful that, perhaps starting with Version 2 or so, LiveView will be successful outside the financial trading market.

Categories: Investment research and trading, Parallelization, StreamBase, Streaming and complex event processing (CEP)

2 Comments

October 11, 2011

IBM is buying parallelization expert Platform Computing

IBM is acquiring Platform Computing, a company with which I had one briefing, last August. Quick background includes: Read more

Categories: Hadoop, IBM and DB2, Investment research and trading, MapReduce, Parallelization, Scientific research

5 Comments

September 12, 2011

Hadoop notes

I visited California recently, and chatted with numerous companies involved in Hadoop — Cloudera, Hortonworks, MapR, DataStax, Datameer, and more. I’ll defer further Hadoop technical discussions for now — my target to restart them is later this month — but that still leaves some other issues to discuss, namely adoption and partnering.

The total number of enterprises in the world paying subscription and license fees that they would regard as being for “Hadoop or something Hadoop-related” probably is not much over 100 right now, but I’d expect to see pretty rapid growth. Beyond that, let’s divide customers into three groups:

Internet businesses.
Traditional enterprises ‘ internet operations.
Traditional enterprises’ other operations.

Hadoop vendors, in different mixes, claim to be doing well in all three segments. Even so, almost all use cases involve some kind of machine-generated data, with one exception being a credit card vendor crunching a large database of transaction details. Multiple kinds of machine-generated data come into play — web/network/mobile device logs, financial trade data, scientific/experimental data, and more. In particular, pharmaceutical research got some mentions, which makes sense, in that it’s one area of scientific research that actually enjoys fat for-profit research budgets.

Categories: Cloudera, Hadoop, Health care, Hortonworks, Investment research and trading, Log analysis, MapR, MapReduce, Market share and customer counts, Scientific research, Web analytics

5 Comments

July 6, 2011

Petabyte-scale Hadoop clusters (dozens of them)

I recently learned that there are 7 Vertica clusters with a petabyte (or more) each of user data. So I asked around about other petabyte-scale clusters. It turns out that there are several dozen such clusters (at least) running Hadoop.

Cloudera can identify 22 CDH (Cloudera Distribution [of] Hadoop) clusters holding one petabyte or more of user data each, at 16 different organizations. This does not count Facebook or Yahoo, who are huge Hadoop users but not, I gather, running CDH. Meanwhile, Eric Baldeschwieler of Hortonworks tells me that Yahoo’s latest stated figures are:

42,000 Hadoop nodes …
… holding 180-200 petabytes of data.

Categories: Cloudera, Facebook, Hadoop, Investment research and trading, Log analysis, MapReduce, Market share and customer counts, Petabyte-scale data management, Scientific research, Web analytics, Yahoo

13 Comments

July 5, 2011

Eight kinds of analytic database (Part 2)

In Part 1 of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I’ll cover four more kinds of analytic database — even newer, for the most part, with a use case/product short list match that is even less clear. Read more

Categories: Analytic technologies, Archiving and information preservation, Business intelligence, Buying processes, Cloud computing, Columnar database management, Data mart outsourcing, Data types, Data warehouse appliances, Data warehousing, Database compression, Database diversity, EAI, EII, ETL, ELT, ETLT, Greenplum, Hadoop, Investment research and trading, Log analysis, MapReduce, MOLAP, MySQL, Netezza, NoSQL, Open source, Petabyte-scale data management, Predictive modeling and advanced analytics, Rainstor, SAND Technology, Scientific research, SenSage, Software as a Service (SaaS), Streaming and complex event processing (CEP), Telecommunications, Vertica Systems, Web analytics

6 Comments

July 5, 2011

Eight kinds of analytic database (Part 1)

Analytic data management technology has blossomed, leading to many questions along the lines of “So which products should I use for which category of problem?” The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for “big data” is little help.

Let’s try eight categories instead. While no categorization is ever perfect, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need — and in most cases you’ll need several — is a great early step in your analytic technology planning. Read more

Categories: Analytic technologies, Aster Data, Benchmarks and POCs, Business intelligence, Buying processes, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, Database diversity, Exadata, Greenplum, IBM and DB2, Infobright, Investment research and trading, Log analysis, Microsoft and SQL*Server, MOLAP, Netezza, OLTP, Oracle, ParAccel, Parallelization, Petabyte-scale data management, Predictive modeling and advanced analytics, Pricing, QlikTech and QlikView, SAND Technology, Scientific research, Sybase, Teradata, Vertica Systems, Web analytics, Workload management

7 Comments

June 20, 2011

Temporal data, time series, and imprecise predicates

I’ve been confused about temporal data management for a while, because there are several different things going on.

Date arithmetic. This of course has been around for a very long — er, for a very long time.
Time-series-aware compression. This has been around for quite a while too.
“Time travel”/snapshotting — preserving the state of the database at previous points in time. This is a matter of exposing (and not throwing away) the information you capture via MVCC (Multi-Version Concurrency Control) and/or append-only updates (as opposed to update-in-place). Those update strategies are increasingly popular for pretty much anything except update-intensive OLTP (OnLine Transaction Processing) DBMS, so time-travel/snapshotting is an achievable feature for most vendors.
Bitemporal data access. This occurs when a fact has both a transaction timestamp and a separate validity duration. A Wikipedia article seems to cover the subject pretty well, and I touched on Teradata’s bitemporal plans back in 2009.
Time series SQL extensions. Vertica explained its version of these to me a few days ago. I imagine Sybase IQ and other serious financial-trading market players have similar features.

In essence, the point of time series/event series SQL functionality is to do SQL against incomplete, imprecise, or derived data.* Read more

Categories: Analytic technologies, Data types, Investment research and trading, Log analysis, Sybase, Telecommunications, Theory and architecture, Vertica Systems

2 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Investment research and trading

Agile predictive analytics — the “easy” parts

Terminology: Data mustering

Some big-vendor execution questions, and why they matter

StreamBase catchup

IBM is buying parallelization expert Platform Computing

Hadoop notes

Petabyte-scale Hadoop clusters (dozens of them)

Eight kinds of analytic database (Part 2)

Eight kinds of analytic database (Part 1)

Temporal data, time series, and imprecise predicates

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin