[PostgreSQL] Starting from scratch: (1) First introduction to PostgreSQL

PostgreSQL database introduction

PostgreSQL is a powerful open source object-relational database system that uses and extends the SQL language and combines it with many features to safely store and scale the most complex data workloads. PostgreSQL's origins date back to 1986 as part of the POSTGRES project at the University of California, Berkeley, and has over 35 years of active development experience on the core platform.

PostgreSQL has earned a strong reputation for its proven architecture, reliability, data integrity, robust feature set, scalability, and the dedication of the open source community behind the software to consistently deliver high-performance and innovative solutions. PostgreSQL runs on all major operating systems, has been ACID compliant since 2001, and has powerful add-ons such as the popular PostGIS geospatial database extender. Not surprisingly, PostgreSQL has become the open source relational database of choice for many people and organizations.

Why use PostgreSQL?

PostgreSQL comes with many features designed to help developers build applications, administrators protect data integrity and build fault-tolerant environments, and help you manage your data no matter how large or small your data set is. In addition to being free and open source, PostgreSQL is highly scalable. For example, you can define your own data types, build custom functions, and even write code in a different programming language without having to recompile your database!

PostgreSQL attempts to adhere to SQL standards, and this consistency does not contradict traditional functionality or lead to poor architectural decisions. Supports many of the features required by the SQL standard, although sometimes the syntax or functionality is slightly different. Further consistency is expected to be achieved over time. As of version 16, released in September 2023, PostgreSQL is compliant with at least 170 of the 179 mandatory features for SQL:2023 Core conformance. As of this writing, no relational database fully meets this standard.

PostgreSQL's slogan is "the world's most advanced open source relational database."

PostgreSQL is the most powerful relational database on the market besides Oracle.

The PostgreSQL community is a pure community and is not controlled by commercial companies. Many end users and cloud vendors are willing to contribute core code, allowing PostgreSQL to receive rapid version iterations and rich application plug-ins.

With so many end users, why should cloud vendors contribute core code?

  • end user
    • I hope that the community will last long and that I can enjoy free, sustainable, open source, enterprise-level databases that are not controlled by any commercial company or any country. Go to O, go to DB2, go to Sybase;
    • Don’t make money from databases;
    • The more people who use PG, the more people endorse it, and the more reliable it is to use (this is also true);
    • To attract good ideas, the company invests in 2 R&D and continues to contribute (perhaps one or two million a year). In fact, thousands of people in the entire PG community are contributing, which is a huge profit for end users. Using a commercial database, in addition to LICENSE and other costs, still requires investment in management, R&D, and outsourcing resources, which can amount to tens of thousands or even hundreds of millions a year. The larger the company, the more motivated it is to contribute to the community. Judging from the trend, the number of large customers contributing code to PG will only increase.
  • Cloud vendors
    • Open source databases and cloud vendors have conflicts of interest and have changed their agreements;
    • The database market is huge;
    • Self-research is the best choice, but there are some problems with self-research: for example, it needs to cultivate an ecosystem, needs market endorsement, requires a lot of R&D resources, and may need to reinvent the wheel;

Benefits of developing based on PostgreSQL:

  1. No need to cultivate your own ecology,
  2. Avoid reinventing the wheel,
  3. PostgreSQL has a very good code base and is known as the Oracle in the open source world.
  4. Prevent other vendors from controlling PostgreSQL and losing its market dominance (AWS, Google, IBM, and Microsoft have all become sponsors of the PG community)

Why learn PostgreSQL?

China is currently implementing a comprehensive localization replacement project. From military industry, government, finance, medical care, education to enterprises, it is gradually replacing non-localized things, such as computer servers, military industry, software, etc. Databases in software are an important part. There are many domestic data software. Here are a few commonly used domestically reconstructed relational centralized architecture databases.

Name database company underlying technology Technical version
GaussDB Huawei PostgreSQL 9.6
Polardb-postgres Alibaba PostgreSQL 9.6
Polardb-mysql Alibaba mysql 5.6
TDSQL-postgres Tencent PostgreSQL 9.6
TDSQL-mysql Tencent mysql 5.6
HighgoDatabase high school PostgreSQL 9.6
KingbaseES Renmin University of Finance and Economics PostgreSQL 12
BASE South University general use PostgreSQL 9.6
DM Dameng Oracle 9i (source code leaked version)

As of the date of publication of this article, the proportion of domestic and foreign database usage is:

Proportion of foreign database usage

Insert image description here
Data source:https://db-engines.com/en/ranking/relational+dbms

Domestic database usage proportion

Insert image description here
Number source: https://www.modb.pro/dbRank

You can see that the top underlying technologies basically use PostgreSQL.

Why are most of the bottom layers of domestic databases based on PostgreSQL instead of MYSQL?

As an outstanding representative of open source databases, we mainly compare the copyrights of PostgreSQL and MySQL. Copyright can be understood as a license, which is directly related to the description of the open source agreement. Let’s take a look at the license expressions of the two.

PostgreSQL License
The PostgreSQL license is a free open source license, similar to the BSD or MIT license. Some copyrights before 1994 belong to the California Board of Directors; 1996-2020, some copyrights belong to the PostgreSQL global development team; the main members of the global development team are scattered around the world, and are not controlled by any corporate entity behind them, making it a real Open projects.
The BSD open source protocol is a protocol that gives users great freedom. You can freely use it, modify the source code, and re-release the modified code as open source or proprietary software. It is known as the "living Lei Feng" of open source licensing.
BSD code encourages code sharing, but the copyright of the code author needs to be respected. BSD is a protocol that is friendly to commercial integration because it allows users to modify and redistribute code, and also allows commercial software to be released and sold using or developed on BSD code. Many companies prefer the BSD protocol when choosing open source products, because they can fully control these third-party codes and can modify or re-develop them when necessary.
PostgreSQL license description: https://www.postgresql.org/about/licence/

MySQL License
As we all know, MySQL is controlled by Oracle, and MySQL uses both the GPL and a commercial license (called dual licensing).
GPL (General Public license) is a public license, and software that follows the GPL is public. If a certain software uses GPL software, then the software also needs to be open source. If it is not open source, GPL software cannot be used. This has nothing to do with whether the software is commercialized or not.
If you cannot meet the GPL, you need to obtain a commercial license, contact Oracle, and develop a solution, which is bound by Oracle.

Specific constraints:
① It is not allowed to apply for patents on modifications made on MySQL;
② Modifications on MySQL need to be made public and the ownership belongs to All owned by Oracle;
③ Source code modifications for purely academic purposes and practice purposes are also GPL-compliant;
④ Oracle’s MySQL Enterprise Edition or advanced features will involve fees. And Oracle does not allow other closed source products based on MySQL.
Other databases based on MySQL also follow and must follow the GPL license or the revised version of the GPL, GPL V2, such as Mariadb. The GPL license logically conflicts with the commercial license. It can be understood that the commercial license is a privilege reserved for the company that controls MySQL.
Since GPL strictly requires that software products that use GPL class libraries must use the GPL protocol, open source codes that use the GPL protocol, commercial software or those that have confidentiality requirements for the code are not suitable for integration/adoptation. Class library and the basis of secondary development. From GPL to GPL V2 V3, as well as LGPL, this agreement has been evolving, and the content expression is relatively complex, which will affect the development and inheritance of the open source spirit and easily lead to disagreements.

MySQL license description:https://www.mysql.com/about/legal/licensing/oem/
GPL V2 original description: https://www.gnu.org/licenses/old-licenses/gpl-2.0.html

This paragraph is referenced from [Decryption: Why domestic databases use PostgreSQL instead of MySQL]

PostgreSQL vs. MySQL

Advantages of PostgreSQL

  1. The standard implementation of SQL is more complete than MySQL, and the function implementation is more rigorous.
  2. It has relatively complete support for table connections, relatively complete optimizer functions, supports many index types, and has strong complex query capabilities.
  3. The main table of PostgreSQL is stored in a heap table, while MySQL uses an index to organize the table, which can support a larger amount of data than MySQL.
  4. PostgreSQL's primary and secondary replication is physical replication. Compared with MySQL's binlog-based logical replication, data consistency is more reliable, replication performance is higher, and it has less impact on host performance.
  5. PostgreSQL supports JSON and other NoSQL features such as native XML support and key-value pairs using HSTORE. It also supports indexing JSON data to speed up access, especially version 10 JSONB is even more powerful.
  6. PostgreSQL is completely free and is under the BSD protocol. If you modify PostgreSQL and then sell it for money, no one will care about you. This is very important because it shows that the PostgreSQL database will not be controlled by other companies. On the contrary, MySQL is now mainly controlled by Oracle Corporation.

Advantages of MySQL

  1. Innodb's MVCC mechanism based on rollback segments is superior to the XID-based MVCC mechanism in which PG's new and old data are stored together. New and old data are stored together, and VACUUM needs to be triggered regularly, which will bring redundant IO and database object locking overhead, causing the overall concurrency capability of the database to decrease. Moreover, if VACUUM is not cleaned in time, it may cause data expansion.
  2. MySQL uses indexes to organize tables. This storage method is very suitable for query and delete operations based on primary key matching, but there are constraints on the table structure design.
  3. MySQL's optimizer is relatively simple, and the implementation of system tables, operators, and data types is very streamlined, making it very suitable for simple query operations.
  4. MySQL is more popular in China than PostgreSQL, while PostgreSQL seems a bit lonely in China.
  5. MySQL's storage engine plug-in mechanism makes its application scenarios more extensive. For example, in addition to innodb being suitable for transaction processing scenarios, myisam is suitable for static data query scenarios.

From the perspective of application scenarios, PG is more suitable for strict enterprise application scenarios (such as finance, telecommunications, ERP, CRM), but it is not limited to this. PostgreSQL's json, jsonb, hstore and other data formats are especially suitable for some big data formats. Analysis; MySQL is more suitable for Internet scenarios with relatively simple business logic and low data reliability requirements (such as Google, Facebook, Alibaba). Of course, MySQL is now vigorously developed in the innodb engine and its functions are performing well.

Guess you like

Origin blog.csdn.net/sinat_36528886/article/details/134957163