Turn: MySQL may be the best fit for Uber, but not necessarily for you

https://coyee.com/article/10766-mysql-might-be-right-for-uber-but-not-for-you?fromoschina

Uber posted an article the other day " Why Uber engineers switched Postgres databases to MySQL? ", I didn't read the article right away because my heart told me I might as well go home and do something else. But my mailbox is full of questions like - is PostgreSQL really that bad? Knowing that PostgreSQL isn't that bad, these emails made me wonder what the hell this article was about. This article mainly interprets the problems in Uber's article.

In my opinion, the Uber article is basically saying that they found MySQL to be a better fit for their environment than PostgreSQL. But the way this article delivers the message is terrible. They should have written "some limitations of PostgreSQL in heavy update operation scenarios" instead of "write schema flaws". For example, if your use case is not write-heavy, you don't need to worry about the issues Uber describes.

0 This post of mine will explain why I don't think Uber's article can be used as general database sizing advice; why MySQL is a good fit for Uber, and why success can cause more problems than just data storage scaling problem.

UPDATE operation

The first big problem that the Uber article describes, but still doesn't provide full details on, is that PostgreSQL always needs to update all indexes of a table when it updates a table's records. And MySQL/InnoDB only needs to update the index of the column being modified. The way PostgreSQL takes it causes more disk IO operations to be performed to update those non-indexed columns (the article says "write apply"). If this is a big problem for Uber, these update operations could be a large part of its overall load.

0 Then there's a bit of hype in the Uber article because the article doesn't mention PostgreSQL's Heap-Only-Tuples (HOT). According to PostgreSQL official data , HOT is very suitable for this specific scenario "tuple data is updated repeatedly, but its index columns are not changed". In this scenario, PostgreSQL can update the data without changing the index, provided that the new row Version data exists on the same page as the previous version's data. The latter can be adjusted via the fillfactor setting. I'm guessing the Uber engineers have realized that HOT is not the solution to their problem, since its frequent update operations affect at least one indexed column.

This assumption is also supported by the article, see this sentence: "If one of our tables defines a dozen indexes, an update to a single index field needs to be propagated to all a dozen indexes so that the ctid can point to the new Data row.", the very obvious sentence "affected by only a single index", this is a borderline use case - only one index - otherwise, PostgreSQL's HOT technology can solve this problem.

Side note: I'm really curious if the number of indexes they have will be reduced - I suggest redesigning the indexes . It's entirely possible that these indexes are rarely used, but timing is very important.

It looks like Uber runs a lot of update operations that change at least one indexed column. But for a table with many index columns, this number is still relatively low. If this is the dominant use case, the article's argument for using MySQL instead of PostgreSQL makes sense.

SELECT query

This is another statement in the Uber article that caught my attention: the article explains that MySQL/InnoDB uses clustered indexes, while acknowledging that "InnoDB's design has some minor deficiencies compared to PostgreSQL, mainly in the secondary For index lookups, because InnoDB does two index lookups, PostgreSQL only needs to do one. I wrote an article before describing this problem ("the clustered index penalty ") in context of SQL Server .

0 What caught my attention is that they describe clustered indexes as having some slight disadvantage, but I feel that if you run a lot of queries that use secondary indexes, the disadvantage is quite noticeable. If it's only a slight disadvantage to them, that's a sign that these indexes shouldn't be used much. That is to say, most searches are based on the primary key index (then there is no need to use the clustered index). Note that I wrote "search" rather than "query" here because the shortcomings of clustered indexes affect any statement with a where, not just a select statement. This also means that its frequent updates are mainly based on the primary key query.

Finally, there are some omissions in this article on queries: they don't mention the limitations of PostgreSQL doing index-only scans , especially for very frequently updated databases. PostgreSQL's implementation of index-only scans is almost useless. This simple problem affects most of my clients. I wrote a blog post in 2011 describing this problem. In 2012 PostgreSQL 9.2 had limited support for index-only scans (only valid for most static data). In 2014 I revisited this issue at PgCon. But Uber didn't complain about the problem. The speed of the Select query is not their concern, I guess they solve this query speed problem by running the query on the replicated nodes.

So far, based on Uber's use case, I feel that using Key/Value storage is more suitable. guess what? InnoDB itself is a very reliable and popular Key/Value store. There are even many InnoDB wrappers and some ( very limited ) SQL frontends, of which MySQL and MariaDB are very common products. Pardon my sarcasm, but seriously, if you just need a Key/Value store and occasionally need to run simple SQL queries, then MySQL or MariaDB is a reasonable choice. I guess MySQL is a better choice than any other random NoSQL storage system given the requirement to provide limited SQL query support. On the other hand, Uber is building their own (Schemaless) architecture based on InnoDB and MySQL.

redo index

One final note on the article index description: it uses the word rebalancing, which is used on B-tree indexes. Also the word links to the Wikipedia article " Rebalancing after deletion ." Unfortunately, the Wikipedia article does not apply to database indexes, because the algorithm described in the Wikipedia article requires each node to be at least half full. To improve concurrent processing, PostgreSQL uses Lehman, Yao's variant of B-trees , which supports sparse indexes. As a side note, PostgreSQL still removes empty pages from the index (see slide 15 of "Indexing Internals" ). Then, this is just a side issue.

What really worries me is this statement: "A critical aspect of using B-tree indexes is that they have to be rebalancing periodically..." Here I want to clarify that this is not done every day Process. Index balance is maintained every time the index changes (is there something worse?), but the article goes on to say "The operation of index redo changes the structure of the tree as subtrees are moved to disk with new Location." If you thought that rebalancing would cause a lot of data migration, you'd be wrong.

The most important operation of a B-tree is node splitting. Node splits occur when a node cannot attach a new node that belongs to it. Node splits are typically sent after about 100 inserts. When a node is split, a new node is allocated, about half of the current node's entries are moved to the new node, and the new node is associated with the previous node, the next node, and the parent node. This is how the Lehman Yao algorithm saves a lot of lock resources. In some cases, a new node cannot be added directly to the parent node because the parent node does not have enough space for the new child node entry, then the parent node starts node splitting, and so on.

In the worst case, the split operation will go directly to the root node, then the root node will also be split, and a new root node will be generated on it. Only in this case the B-tree doesn't get deeper. Note that a split at the root efficiently moves the entire tree down, thus keeping the tree balanced. Either way, this doesn't cause a lot of data movement. Worst case it might touch three nodes on each level as well as the new root node. But one thing is clear: the index in the real environment generally does not exceed level 5. To be more explicit: the worst case, a root node split, will only happen once every 1 billion inserts. In other cases there is no need to traverse the entire tree. After all, the maintenance of the index is not periodic, or even very frequent, nor does it completely change the structure of the tree, at least without physical operations on disk.

physical copy

This is another focus of my PostgreSQL part of that article - physical replication. The article mentions index redo because Uber hit a replication bug in PostgreSQL that corrupted data on downstream servers (this bug only affects a specific version of PostgreSQL 9.2 and was fixed long ago).

Because PostgreSQL 9.2 provides physical replication functionality in the core, a replication bug "would render most of the tree unusable". In more detail: if a node split is not replicated correctly, it cannot point to the correct child node, and the subtree is unavailable. That's exactly right, like the saying: "If there's a bug, it's a big deal". You don't need to modify a lot of data to corrupt the whole tree structure, a simple bad pointer is enough.

Some other physical replication problems mentioned in the Uber article are: 1. Excessive replication traffic; 2. Amplification of write operations due to update operations; 3. Too long downtime caused by updating PostgreSQL versions. I'm sensitive to the first question, and I can't comment on the second (but more details can be found in the announcement on the PostgreSQL-hackers mailing list ).

Finally, the article also claims that "Postgres has no real replication MVCC support". Fortunately, the PostgreSQL documentation the article links to has an explanation of this problem. The root of the problem is that the replicated master doesn't know what the slave is doing, and so may delete some data that is still needed to complete the query on the replica.

According to the PostgreSQL documentation , there are two ways to deal with these issues:

Delays the application's replication flow with a configurable timeout so that read transactions have ample time to complete queries. If a query fails to complete in the required time, stop the query and resume replication.
Configure replica nodes to send feedback about query execution to the master node so that the master node does not flush the row version data needed by the slave node. The Uber article uses the first method, but doesn't mention the second method at all. This is a slap in the face for the Uber developers.

About the developer

To quote some common scenarios: "Suppose, the developer needs to email a receipt to the user. It's up to the developer to write this code that opens a database transaction and commits the transaction when the email is complete. Let your code It is a bad practice to open a database transaction for a long time to wait for some unrelated blocking operations. In reality, most developers are not database experts and cannot understand this problem, especially when using some ORM frameworks to shield the underlying operations Especially so.”

Unfortunately, I understand and even agree with this view. Rather than saying "the vast majority of developers are not database experts" I'm saying that the vast majority of developers have a very shallow understanding of databases, and every developer who comes into contact with SQL should understand database transactions -- not just database experts.

My main job right now is doing some SQL training for developers, which I've done for companies of all sizes. If there's one thing I'm sure of, it's that most people have very little knowledge of SQL. On the "open transaction" issue, I can confirm that hardly a single developer knows about the existence of read-only transactions. Most developers know that transactions can rollback write operations if the operation fails. I often meet developers with this misunderstanding, so I've prepared some slides to explain the problem.

About success

Here's one last question I want to address: the more developers a company hires, the closer the company is to its average. It's an exaggeration to say that if you hire the entire planet, the level is the average of all people. Hiring more people just increases the sample size for this sampling.

Two ways to break this rule are:

Hiring only the best developers is very difficult because you may have to wait a long time to find good people.
Hire mid-level people and provide training. This requires a long time for a new employee to get started, and may also require existing employees to spend time training these new employees. The problem common to both approaches is time. Because your business is growing fast, you probably don't have much time to wait, so you can only hire developers who don't know much about databases ( 2014 empirical data ). In other words, for a fast-growing company, changing technology is much easier than changing people.

Over time, success factors also influence changes in the requirements for the technology stack. In the early stages, startups need technology that is immediately available and flexible enough to meet the needs of the business. SQL is a good choice because it's really flexible (you can do data queries in many ways), and it's easy to hire developers who know a little bit of SQL. Great, then let's get started. For many companies, or the vast majority of companies, the story ends. Even if these companies are more successful and their business grows further, they will remain stuck in the confines of SQL databases forever. Not so with Uber.

Some lucky startups will eventually get away from SQL. If this happens, there are more resources (or almost infinite) and then the wonderful thing happens: they realize they can solve a lot of problems by themselves, such as replacing a common database system, and then develop their own to replace it. This is also why NoSQL databases were born, at Uber they call it Schemaless .

Uber's database selection

As of now, I believe Uber doesn't need to replace the PostgreSQL database with MySQL as their article states. It looks like they just need to replace PostgreSQL with their own solution, Uber's solution that happens to be what MySQL/InnoDB currently offers.

That article just explains why MySQL/InnoDB is a better fit than PostgreSQL as a Schemaless schemaless data backend. If you're also using Schemaless, follow Uber's advice. However, the article does not clearly state how their requirements changed, as in 2013 they migrated the database from MySQL to PostgreSQL .

Sadly, articles like this leave the reader with a bad impression - PostgreSQL sucks.