The database architecture of salesforce.com, force.com, and database.com
salesforce.com, force.com, and database.com use exactly the same database infrastructure and architecture. That’s the good news. The bad news is that salesforce.com is somewhat obscure about technical details, for reasons such as:
- A long-ago marketing decision to not give infrastructure details, so as to convey a “Don’t worry; we’ll take care of everything” message.
- Even so, a long-ago and perhaps now-regretted marketing decision to disclose and even exaggerate salesforce.com’s reliance on Oracle, as part of an early-days attempt to prove salesforce was using enterprise-class technology.
- A desire to hide the recipe for salesforce.com’s secret sauce.
- Force of habit — I’m not sure salesforce even knows how to tell its technical story with any clarity.
Actually, salesforce.com has moved some kinds of data out of Oracle that previously used to be stored there. Besides Oracle, salesforce uses at least a file system and a RAM-based data store about which I have no details. Even so, much of salesforce.com’s data is stored in Oracle — a single instance of Oracle, which it believes may be the largest instance of Oracle in the world.
Salesforce did spell out some of its database story in a 2008 force.com white paper, which is good stuff, but potentially misleading in one important way. The paper tells of a level of abstraction, whereby what the application sees as logical “columns” are stored in a very different schema than one might assume. However, it doesn’t spell out a second level of abstraction, whereby that logical schema also isn’t how the database is actually laid out.
Another flaw in the paper is that it spins “We had to do this, to support multitenancy, so we did.” issues as “Because we’re multitenant, we can do this, while single-tenant systems can’t.” One example is the query optimization step around “user visibility” in Figure 11. Welcome to marketing.
At the first level of abstraction, data seems to be kept mainly in a single wide table, with hundreds of columns. What’s more, many of those are “flex columns”; a flex column can hold data of many different kinds and even datatypes. Notwithstanding the second level of abstraction, I imagine the idea of stuffing different kinds of thing into the same column has something to do with the fact that Oracle’s physical limit on columns falls far short of the number of logical columns salesforce wants to use.
If we imagine that the different kinds of data in a flex column were each in their own column instead, the whole thing might sound like BigTable/Cassandra/HBase-style column-group NoSQL. Thus, much as Workday uses MySQL to simulate a key-value store, salesforce.com can be said to use Oracle to simulate a different kind of NoSQL. In both cases, what’s going on seems to be a kind of object/relational mapping, but with the relational aspect strongly deemphasized. Or, if you take a more relational view, we could say that salesforce.com’s tables are a lot wider than any one user organization’s, because each user sees only its own custom columns (plus the standard ones common to all users).
The second layer of abstraction has a lot to do with multitenancy. If you want to stick data for many different user organizations into the same huge table, then you have to label it in some way to show who is permitted to see or update each part. Logically, this leads to a join, between one table carrying data plus a simple key showing which users/roles are entitled to see it, and a second table showing who actually is that kind of user/has that kind of role. But that join makes a lot of sense to store in a denormalized way, all the more because data is partitioned across the computer cluster in line with which user organization it actually belongs to.
Multitenant security isn’t the only reason for this denormalization, but it appears to be the biggest one.
The whole thing is doing 550 million or so transactions per day. salesforce.com thinks that fact should be regarded as evidence that it works. 🙂
Comments
19 Responses to “The database architecture of salesforce.com, force.com, and database.com”
Leave a Reply
[…] database architecture of salesforce.com, force.com, and database.com Categories: Pricing, Software as a Service (SaaS), salesforce.com Subscribe to our […]
Curt,
550M transactions per day (24h?) amounts to
about 6366 transactions per second. Far less
impressive than it sounds.
Dan Koren
” which it believes may be the largest instance of Oracle in the world.” Is that a measure of the number of CPU/cores, or in data volume or some other measure ?
[…] – Database architecture of Salesforce.com, force.com and database.com – Oracle databases doing 550M […]
Gary,
I don’t know.
I’d GUESS CPUs/cores or something — SaaS applications lend themselves to parallelization really well, because you’re doing the same thing at once on many small subsets of the data.
Dan,
Good point of arithmetic. I would guess those are heftier transactions than one sees on most other real-world systems that average 1000s of transactions/second. But yes, this is a really big, serious transactional database, not something that is off-the-charts humongous.
One thing to note is that salesforce is not pitching “Buy from us and set up a database just like the one big one we’ve proven the technology on already.” Rather, salesforce is pitching “Buy from us and participate in the big database that’s already been up and running without serious incident for years.” The latter is indeed a somewhat stronger pitch.
I would like to point out that it’s not a single Oracle instance but rather a single Oracle Real Application Cluster (Oracle RAC) on several high-end Sun boxes that backs all these force’s. At Oracle an instance refers to a single server. An Oracle database (collection of physical files) is managed either by a single instance or by RAC consisting of multiple instances.
I attended SOCC’10 last year where Saleforce’s CTO Rob Woollen gave a keynote on the force.com architecture [http://research.microsoft.com/en-us/um/redmond/events/socc2010/woollen.htm]. Judging by my own recollection and notes a fellow attendee took http://sna-projects.com/blog/2010/06/socc-2010-updates/ Salesforce used an 8-node Oracle RAC as of 2010 and Rob was talking about gradual extension of this installation over the years. It might be more nodes by now.
Dan,
We should also note that it’s easy to confuse our intuitions about transactions/second and transactions/minute. 380,000 transactions/minute is quite a lot, especially as an average rather than a peak.
Thanks for the data, Gera!
That kind of throughput over a long period of time is at the top end of what Microsoft also considers large database workloads
210B txn p.a.
396K txn per minute
http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2008/Mediterranean-Shipping-Company/Mediterranean-Shipping-Company-Managing-22-Terabytes-of-data-with-SQL-Server-2008/4000003470
If some of the columns are what amount to custom fields for certain customers, does that mean signing a new major customer might require a (presumably extensive) migration of their table?
[…] salesforce.com, force.com, and database.com use exactly the same database infrastructure and architecture. That’s the good news. The bad news is that salesforce.com is somewhat obscure about technical details… Lire l’article […]
@Meng,
Nope. You just add new custom columns (at least logically). Hence my analogy to BigTable/Cassandra/HBase.
@Gera,
They ditched the Sun Servers and replaced them with Dell.
http://content.dell.com/us/en/enterprise/d/corporate~case-studies~en/Documents~2010-sfdc-10008118.pdf.aspx
[…] going to stick with Oracle?” So let me refer to and expand upon my previous post about salesforce.com’s database architecture by […]
[…] Unknown […]
[…] standalone database service, Database.com, isn’t exactly NoSQL, but it isn’t exactly a relational database, either. What it is for sure is the same multitenant database architecture that has been underneath […]
[…] the most scalable database available at the time, which was Oracle. However, salesforce.com had to architect the database very in a specific way that would support multi-tenant scalability and security as well efficient […]
[…] Unknown […]