MORE SQL?

看到CSDN里面有篇名为"MoreSQL（NewSQL）挑战 NoSQL？"的文章，大概意思就是不用
更改现有关系型数据库的前提下提高数据库的性能。
文章中有介绍到现在大家之所以对于NOSQL很热衷，是因为关系型数据库对于join操作
性能极其低下，然后我看了下Tokutek的网站，里面有介绍为什么在关系型数据库中会
出现这种情况，但一直没明白，意思是说在查询的时候会出现索引“join”主键的情况，
思索良久不得其中原委，希望大牛们指点下，我把部分原文给贴出来下：

twitter feed for a chance to win.

Did you know that the following query actually performs a JOIN? You can’t see it, but it’s there:

SELECT the_day, COUNT(*), SUM(clicks), SUM(cost)
FROM ad_clicks_by_day
WHERE the_day >= '2005-07-01' AND the_day < '2005-07-07'
GROUP BY the_day;
Let me explain.

Suppose you define the table as follows:

CREATE TABLE ad_clicks_by_day (
customer INT NOT NULL,
the_day DATE NOT NULL,
clicks INT NOT NULL DEFAULT 0,
cost INT NOT NULL DEFAULT 0,
PRIMARY KEY(customer, the_day),
INDEX(the_day)
) ENGINE=InnoDB;
What happens when MySQL executes this query? Here’s the EXPLAIN:

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ad_clicks_by_day
type: range
possible_keys: the_day
key: the_day
key_len: 3
ref: NULL
rows: 368
Extra: Using where
That looks fine, doesn’t it? It’s using an index range scan to find approximately 368 matching rows, and adding up the clicks and cost for them. Can it be done any more efficiently than this?

In fact, it turns out that it’s possible to execute this query much more efficiently. What happens internally when the server executes this query is that it begins reading from the ‘the_day’ index, and for each row it finds, it performs a lookup in the primary key (which, in InnoDB, stores the whole table) to find the other columns mentioned in the query. That is, it’s joining the index to the primary key! This is not just an ivory-tower abstraction. In real-life workloads, the random I/O caused by such index-to-table joins slows queries dramatically.

The “join” here isn’t really the same as a true table-to-table SQL JOIN in some ways, and it has a little less overhead at the server level than a table-to-table join, but in terms of the way an index-to-row lookup works, it’s quite similar in many ways.

If you’re skilled at logical and physical database design, you already noticed that my example can be improved by indexing differently: just put the primary key on (the_day, customer) instead of the other way around. Then the query will use the primary key instead of the secondary key, and the whole row is available to the query without doing a join to another index. This is called a covering index, which means that the index covers the whole query–there is no need for any data outside the index. You’ll see “Using index” in the Extra column from EXPLAIN when an index covers a query.

But then what if we want to group the results by customer? No problem, we can just add another index. To achieve the same results for ad-hoc querying, we’ll need to index every column:

原文的网址是:http://www.tokutek.com/2011/09/are-you-forcing-mysql-to-do-twice-as-many-joins-as-necessary/

猜你喜欢