How is MySQL’s join performance these days?
In a comment thread on a recent post comparing MySQL to Postgres, Jonathon Moore chimed in based on experience with both products. His characterization of some MySQL problems:
Regardless of what back end you choose the mysql query engine is week. It only supports nested loop scan, no merge join, no hash join, nothing fancy. Further sub quires can’t even use indexes so they are near useless. The rule when using mysql is *don’t join*. So the case where mysql can be put to good use is where all your quires are over a single table, which is why it has done well for web apps I wold suspect.
was similar to those in a mid-2006 post on MySQL Performancing Blog, which said:
One of the reasons elevating this problem in MySQL is lack of advanced join methods at this point (the work is on a way) – MySQL can’t do hash join or sort merge join – it only can do nested loops method which requires a lot of index lookups which may be random.
What’s the current situation, mid-2008? Surely data warehousing storage engines such as Infobright’s aren’t so artificially limited in their join strategies.
Comments
8 Responses to “How is MySQL’s join performance these days?”
Leave a Reply
KFDB (the storage engine used in the Kickfire appliances) supports multiple types of joins beyond nested loops, but only for tables backed by the KFDB storage engine.
Kickfire supports hash joins as well as additional join methods that leverage hardware acceleration. Kickfire also features accelerated joins to specialized indexes in addition to the performance benefits garnered by vertical column storage. Join performance may be orders of magnitude faster than other storage engines when using specialized join algorithms.
Some storage engines, including Infobright’s, implement their own join algorithms. Infobright implements two join techniques: nested loop and sort-merge. We are also in the process of implementing hash-join. Since we don’t use indices but rather the Knowledge Grid, our implementation of these techniques is different and the performance dynamic much improved.
We also implement subqueries using the Knowledge Grid; so we significantly out perform other MySQL storage engines on large datasets. And our clients consistently use joins and subqueries in their queries.
Victoria (or anybody),
I probably knew the answer to this, but I’ve forgotten:
When you write your own MySQL storage engine, what happens with the optimizer? Do you do one from scratch? Does MySQL have a basic cost-based optimizer for which you write your own cost functions?
Thanks,
CAM
If you implement only the storage engine interface then you would using the MySQL optimizer. This interface is row oriented and expects indices, which are used among other things to determine the costing.
Since Brighthouse is both column oriented and uses the knowledge grid instead of indices, this type of optimizer is not a good fit for us. So although we have implemented this interface and use it in rare cases, we have written our own optimizer to take advantage of the knowledge grid during query execution.
And since it’s open source, you can write your own modules even beyond the default cases for doing so. Duh.
Thanks!
CAM
Really true. i am satisfied reading your article.
I can do well. if you get me scholarship. i want to do Msc in software Engineering. i have strong eagernesses,but nobody want help me in this world.
What are the advantages and disadvantages of using PostgreSQL over MySQL?…
First off, I believe that whatever you’re doing, you’ll probably be fine using either. That said, I’m generally more frustrated when using MySQL than when using PG. Here’s a sample of the problems I’ve encountered from using both MySQL and PG. I h…
Why do people typically choose PostgreSQL over MySQL?…
First off, I believe that whatever you’re doing, you’ll probably be fine using either. That said, I’m generally more frustrated when using MySQL than when using PG. Here’s a sample of the problems I’ve encountered from using both MySQL and PG. I h…