Oracle deal with the difference postgreSQL Affairs (multi-versioning and undo difference)

Around 2015, because of work you need to use MongoDB, CouchBase both document database, from time to time to both the official online information databases, newspaper BUG. Often you can see some news MongoDB official website, "a certain business success will replace MySQL MongoDB, significantly improved performance", "Oracle will be replaced by a certain company MongoDB, a number of cost savings" ......

 

In CouchBase official website, will from time to time to see this news: "A company will replace CouchBase MongoDB, significant performance improvement", "B will be replaced CouchBase MongoDB, performance greatly enhanced." ......

Such advertisements often allow data architects from nothing to something, to see the top news, if you are a data architects, how to choose the database you want to do?

 

With relational MySQL / Oracle, or "better" MongoDB, or "more better" CouchBase? Alternatively, there are "more and more better" database to choose from, not to try a little?

 

I believe that on the official website of the news, certainly will not get virtual fraud. But hide a lot of critical information, such news would not be responsible for a slight point.

 

You know in 2015 when, MongoDB support transactions in just a Document. Simply put, do not support cross-bank transactions. If you believe such news, hastily interbank required, the application of complex transactions, built on top of MongoDB, you can go to prepare a resume.

 

CouchBase too, which is a combination of CouchDB and Memcached. CouchDB is a document database, Memcached as a storage engine together is CouchBase. In fact, the equivalent of the data on the Memcached MongoDB.

 

The reason it faster than MongoDB, because its data is in memory. If MongoDB is also a great memory, all the data are also cached in memory, not to say who is faster.

 

Moreover, the amount of data more than once CouchBase host memory, the occurrence of actual physical I / O, its performance degradation is very clear, often slower than MongoDB.

 

Do not understand the inherent nature of these databases, rushed to select a database. Personal ready to resume it's just the little things, damage caused to the enterprise would be incalculable. Before I wrote "Digg Apocalypse", we summarize the United States knowledge sharing websites Digg, for the pursuit of scale, and blind to replace the underlying database, leads to numerous problems, frequent application errors, often not open the page, resulting in the company in an important development window of opportunity lost, the last $ 500,000 was acquired by fate (the peak period, Digg had valuation of $ 500 million).

 

If you are faced with a wide range of database, a little confusing, it does not matter there is a US professional database team, a US can provide a full range of services from architecture selection, to operation and maintenance, security and so on ...... for your data.

 

Advertising on here, and today we continue to sort data architecture selection in the pit, we discussed this issue OLTP "Transactions."

 

Many people are already familiar with this term, after the popular NoSQL, the transaction seems to have a dispensable things. Indeed Otherwise, NoSQL MongoDB has been a leader in the development of transactional capabilities silently, to 4.0 or later (the latest version is now 4.3), has been largely achieved transactional capabilities of a traditional relational database.

 

Services for OLAP applications is indeed an option, but for OLTP, it is definitely a must.

 

The concept of a transaction is coming from? This period can be traced back to the 1970s, relational databases just born.

 

Before and after 1970, the database has fame circle chiefs --EFCodd, realizing the level of inconvenience network databases, relational model proposed creation stage, from opening the database era relationship.

 

(EFCood, the father of the relational database, the Turing Award winner)

 

But throughout the 1970s, relational development is not very good, EFCodd Father busy promoting his relational model. No time to attend.

 

This time is smaller than EFCodd 20-year-old rookie database: James Gray (nickname Jim) debut.

(Jim youth)

 

Jim is a developer of IBM SYSTEM R. SYSTEM R is trying to IBM's relational database. Jim When developing SYSTEM R, constantly pondering a question:

 

"How can more, faster, better and ensure the consistency of the database of the province."

 

Father Jim came up big trick is this:

The database-related operations as a single atomic operation, these operations related to either succeed or fail. That's affairs, the concept of a transaction, from here was born.

Just make sure each transaction are the same, the database is consistent.

Thus, the question "to ensure the consistency of the database" becomes "ensure transactional consistency."

 

You see, the scope of the problem is much smaller. This is the thinking of top architects.

 

When it comes to how to "more, faster, better and cheaper" to ensure transactional consistency?

 

The problem is not simple, operation of the database are multiple concurrent transactions are related to each other and overlapping, over a period of time, I might update the row you want to query, you have to delete a row he was in this update ...... case intertwined with each other is very common.

 

Later, Father Jim studied this transaction lifetime thing, and finally won the Turing Award in 1994. Turing Award of the Nobel Prize may be the computer industry, and Turing Award about the database, a total of four times. Among them, there is one that is transaction-related and sufficient to show that the transaction is very complex, very important.

(James Gray's speech)

 

Later, Father Jim put his research also wrote a book called "Transaction Processing: Concepts and Techniques." If you want to develop a database transaction to ensure consistency, I suggest you read it.

 

Father brief talk about ideas, then have two-phase commit protocol, but it is for the distributed, so loose, use it to achieve transactional atomicity too slow.

 

Jim open up shortcuts, proposed a DO-UNDO-REDO protocol to Father Zhang book "Transaction Processing: Concepts and Techniques" in FIG explain it:

 

Engaged in Oracle, DB2, SQLServer people a look at this chart, you can take the hint.

 

I explain in the book excerpt about this chart:

DO , is operating, such as some UPDATE, INSERT, or DELETE of SQL. It allows the database from the old state to the new state. It will also produce a log.

 

UNDO , the use of the log, the new state back to the old state.

 

REDO , using the log, the state transition from the old to the new state.

 

In fact, this figure and DB2, SQLServer transaction mechanism is exactly the same as their log and UNDO REDO log is combined. For example: In SQLServer called in the transaction log, rollback, restore rely on transaction log.

 

The reason why Figure DB2, SQLServer transaction mechanism in, and Father Jim book exactly the same, because both databases and Father Jim has a direct link:

 

DB2 can be said to be born out of IBM's relational database products to try: SYSTEM R. Father Jim is the year of the main developers SYSTEM R.

 

SQLServer not to mention, Jim did not want to leave the master used to living in Los Angeles to Seattle, Bill.Gets specializing in Los Angeles built a research institute, to do the Father Dean, he led the development of SQLSever database.

 

Extra say something, I've seen a question: Why programmers love hair blog. Highest number of votes answer is: because to solve the problem of a cow B, no one around to understand.

 

In the 1970s, there is no blog this stuff, but also e-mail the group a few years before the epidemic. Jim solve a series of complex transactions, consistency and other issues in the SYSTEM R, however, no one can appreciate, this feeling of loneliness master, standing above the crowd.

 

How to do, Jim while developing the SYSTEM R, thinking about the complex affairs issues, while taking the time to write a blog, right, was no blog, then writing papers it. In 1976, Jim wrote a significant essay: "Consistency and shared database lock granularity (Granularity of Locks and Degrees of Consistency in a Shared Data Base)". That is, in this paper, Jim first proposed the concept of "transaction" and consistency.

 

Father paper to write the whole day, I did not expect next to a "steal boxing" guy - Ellison. Egypt had already seen EFCodd small papers at IBM, engage in a relational database Oracle. IBM was well-deserved big brother, big brother to follow will go wrong?

 

Egypt steal small fist steal the whole day time, Jim has published his paper on the transaction, so, as you can see, Oracle is also in the affairs of DO-UNDO-REDO protocol up.

 

But, Oracle to UNDO and REDO separated, UNDO UNDO have a special section. It later influenced MySQL, domestic dream up a database, they are separated UNDO, UNDO in UNDO segments.

 

Although the UNDO and REDO together, Jim is the master of authentic thought. But separate design is not bad, is difficult to say who is stronger than who.

 

From the perspective of affairs is, Oracle, DB2, SQLServer, dream up, the design of these database transactions is different.

 

From the maturity, of course, is Oracle, DB2 greater maturity, better performance, but SQLServer, up to the dream database is also good, performance, functionality, and fully meet our needs.

 

But on the rivers and lakes of the database, in addition to the DO-UNDO-REDO school, there is another major genre. Because the use of a completely different transaction implementation agreement, the performance of some operations and DO-UNDO-REDO camp there is a clear difference.

 

This is another major faction, by another Turing Award winner, master-class master of open cases Leekpai, Michael Stonebraker (Michael Stonebraker) to create multiple versions of the pie.

 

(Turing Award, Michael Stonebraker on the award ceremony)

 

In the database community turbulent 1970s, Michael unable to bear the loneliness, the manuscript of a relational database: Ingres. Ingres derived from many databases, PostgreSQL is one of them.

 

Michael Father Jim recognized the concept of affairs, but how to achieve affairs, he used a different method and Jim, he did not completely DO-UNDO-REDO protocol, REDO is still needed. Redo the after-image is to be restored. UNDO it, it is the pre-image data, in order to modify the data before recovery.

 

Before restoring the data to modify the image or to provide pre-read, you need not necessarily have log UNDO (UNDO or para). Multi-version mechanism can also be used.

 

Below Update, for example, compare multiple versions and UNDO faction faction in the transaction process is different from the two schools to sum up the advantages and disadvantages.

 

 

(Table data before Pictured Update, the left is a multi-version to send the right is sent UNDO)

 

For multi-table version of the faction, each row of data will add some version information, transaction number, commit flag is part of the version information. Some send multiple versions of database, timestamp represents the number of transactions with a transaction started, rather than a specific figure. This is very common in NoSQL / NewSQL database.

 

UNDO camp for databases it, there will be a UNDO log. Oracle / MySQL is a special UNDO segment. DB2 / SQLServer the UNDO and REDO are put together.

Here, the user has issued an update operation, the database received an Update command:

 

          Update test set col=’BBB’ where id=4          

 

Database in the implementation of this SQL, transaction processing and process-related data tables are as follows:

 

(Shown above, left for the multi-version, the right to send UNDO)

 

Send multiple versions, will first row 4 ID copy to a new location, and then set the transaction number of the new line of 1899, commit flag is N, the representative of uncommitted transactions, and then modify the Update column values ​​to "BBB". If other Session ID as a query to the line 4, directly from the submitted labeling, you can determine this transaction is not committed, it would then forward the search ID is 4, the maximum number of transactions, the row is marked as submitted, it is to meet the conformance requirements of the line.

 

UNDO send the image has been written before the first UNDO area, and then modify the column you want to modify the original line. This faction me not described in detail, we believe readers will understand more of this faction.

 

By contrast, I think readers will find, if there are ten table, even if only to update a multi-copy version will have to send the entire row to the new location. The UNDO send it, simply copy the original value of the column to be modified UNDO log can be. Such a ratio, in terms of Update operation, UNDO school have an advantage.

 

Of course, as a Turing Award winner, Michael pioneered multi-master version sent definitely have their own expertise.

 

 

You send multiple versions of fans, the first 40 meters down the sword, do not worry, let's look at than the insert:

 

Insert send multiple versions of operation is very simple, you can directly insert a new row in the table.

 

Relatively speaking, UNDO faction Insert, then slightly complicated. To write location information of a new line of UNDO log, the future roll back time, where to find good new line based on this position.

 

Although UNDO sent only a little bit complicated, but for the pursuit of short, flat, fast OLTP applications, a SQL call may also a few milliseconds, more than ten milliseconds. Additional operations UNDO, it is already not a small operation.

 

See it, Michael Father of multiple versions sent successfully pulled inning.

 

For the delete operation, I do not recapture a plan. UNDO send, copy the original line should UNDO log data to the entire row to go. Multi-version sent to deal with the deletion of more than different. The most basic, most simple, is also copied when you remove an entire row to a new location, plus the deletion flag.

 

This is in line with the consistent ideological faction of multiple versions, just do not modify the original lines, each DML once, only copy, modify the new row.

 

很多NoSQL数据库就是这样做的,但成熟的关系型、多版本派数据库,如PostgreSQL,并不会这样简单粗爆。它是在原行处设置一些事务标志,在删除行的时候,只需标记行被删除就可以了,避免了将被删除行复制到新位置。具体细节,我们就不再这里展开讨论了。总之PostgreSQL的删除和插入一样,都是十分节省资源的。

 

UNDO派脱胎于闭源的、商业的SYSTEM R,因此它广范应用于关系的、闭源的、商业的数据库。这个我们前面提到了,如Oracle、DB2、达梦,还有开源的MySQL。

 

多版本派,源于Ingres,这个数据库是开源的,而且,多版派的实现也相对更为简单,不需维护一块专门的UNDO空间。因此多版本派也广泛应用于开源的、或NoSQL/NewSQL等非关系型的数据库。

 

虽然在关系型中,有名的好像也只有PostgreSQL。但是几乎所有支持事务的NoSQL/NewSQL数据库,都属于多版本派。

 

好了,现在我们可以总结一下了。这两大派的数据库谁更适用哪种场景呢?

 

简单总结一下

暗黑料理界又出新品!

 

◎ 多版本派的Insert/Delete不需要复制原行数据,因此简单而快速,但Update负担会重,特别是针对列较多但每次Update只更新少量列的情况。

 

(注:多版本派的Delete不需要复制原行数据,只是针对PostgreSQL,多数多版本派的NoSQL/NewSQL,Delete时还是要复制原行数据到新位置的。)

 

◎ UNDO派,它的优势在于平衡,无论什么DML,都少不了操作UNDO的步骤,性能表现都差不多。

 

 

结合应用来说,比如你有一个日志型的OLTP应用。每秒有大量的并发Session,向数据库中插入大量数据(这和OLAP中少量Session插入大量数据还不样,OLAP的我们以后再说),但这些数据从不Update,或者说很少Update。那么,多版本派的数据库就十分适合了,比如PostgreSQL。

 

如果对事务的要求没那么高,那么一些NoSQL/NewSQL的数据库也可以考虑。比如,事务不多的事情下,可以考虑MongoDB。因为完全的支持ACID的、跨多个表(MongoDB中叫集合)的事务,MongoDB也是刚支持不久,以成熟度来论,相比PostgreSQL会差一些。

 

如果不需要跨表、跨行的事务,甚至不需要事务,选择面就更多了,像HBase、Cassandra等的插入性能都是不错的。

 

如果有一个OLTP交易型的应用,有大量的查询,和A转帐给B这样的交易操作,也就是Update A的余额、再Update B的余额。UNDO派的数据库就比较适合了。毕竟,Update操作时UNDO派更节省资源。

 

但是,有时候应用的界限并不是那么清楚。比如一套大型的应用中,即有日志型的功能,又有交易型的功能,而且数据还是混在一起的。这种情况下,总不能将数据写两份,分别放在不同数据库中吧(极端情况下,也可以这样做)。

 

这要如何选择呢?就要看那种功能更重要了。日志型功能更重要,就选多版本派的数据库。交易型功能更重要,就选UNDO派数据库。如果都重要,我建议优选UNDO派的、成熟的数据库。因为UNDO派各种操作性能更加平衡,不会出现忽快忽慢的情况。

 

要说,不同宗派之间,还容易选择。但是同一宗派内部呢?

 

同一宗派内部,我们也不好在公开场合说谁优谁劣,TPCC的性能测试数据,每家都十分优秀。

 

大家使用的基本思想是一样的,好坏取决于开发者的编程能力,能去开发数据库的,都是像下面这样的好程序员:

( 见过我的人都说,我也是一名标准的好程序员:) )

 

所以,同一宗派内部,在不考虑钱的情况下,选择成熟度高的数据库。要考虑钱的情况下,国产的数据库,和开源数据库的确是一个不错的选择。现在国产数据库、和像MySQL/PostgreSQL这样的历史悠久的开源数据库,成熟度也已经十分好了。

 

除去事务的影响,不同的数据存储、组织模式、锁级别等,都会带来一些性能差别。比如:Oracle的表是堆表,堆表是无序的。MySQL InnoDB的表是索引组织表,索引组织表是要按索引排序的。排序操作会额外带来一些性能损耗,但会提升按主键查询时的性能,等等。这些非事务性的,我们后面再总结。

 

好了,篇幅已经不短了,这一期,我们只说事务。后续,留待下一期吧。

 

话说UNDO派、多版派,就像少林、武当两大派一样,统领江湖几十载,江湖上一片风平浪静。各个应用厂商按需选择,一时江湖上倒也相安无事。

 

进入二十一世纪一零年代,UNDO派开山宗师Jim某一日正在洞府中打坐修行,忽然一阵心旗摇动,Jim掐指一算,自知大限将至,遂扬帆出海,小舟从此逝、江海寄余生。

 

(注:2007年年初,Jim架游艇出海,将老母亲骨灰撒入大海,然后失踪。海岸警卫队、志愿者、学术界朋友纷纷加入搜索,甚至动用卫星和若干先进技术,搜索几天后仍一无所获。但很多人仍没有放弃搜索,直到五年又四个月后(2012年5月16),才宣布他的死亡。)

 

多版本派开山宗师Michael,也年寿已高,不问江湖世事。江湖中两位大佬一死一老,对江湖的控制力大大减弱。正在这时,一本叫做NoSQL的武功秘籍,在江湖上又掀起血雨腥风。

 

至于NoSQL重出江湖之后,数据库界又有什么样的风云变幻,且听下回分解。

 

转载自:https://mp.weixin.qq.com/s/6RiVgNp6T-2CnfBy8rpHvA

Guess you like

Origin www.cnblogs.com/xibuhaohao/p/11325468.html