PostgreSQL conciseness, usage, forward and inverted indexes, spatial search, users and roles

PostgreSQL uses

  • PostgreSQL is a free object-relational database server (ORDBMS), released under the flexible BSD license.
  • PostgreSQL 9.0: Support 64-bit windows system, asynchronous stream data replication, Hot Standby;
  • The mainstream version of the production environment is PostgreSQL 12

BSD agreement and GPL agreement

BSD protocol: You can freely use and modify the source code, and you can also release the modified code as open source or proprietary software.
GPL agreement: If a software uses GPL software, then the software also needs to be open source. If it is not open source, GPL software cannot be used. MySQL is controlled by Oracle, and MySQL uses the GPL

Comparison of PostgreSQL and MySQL

  • PG has more index types than MySQL;
  • PG's master-slave replication is physical replication, compared to MySQL's binlog-based logical replication
  • PostgreSQL is completely free, and it is a BSD agreement, MySQL is a GPL agreement, controlled by Oracle;
  • The PG main table is stored in a heap table, and MySQL uses an index-organized table, which can support a larger data volume than MySQL.
    In summary, PostgreSQL is suitable for strict enterprise scenarios, while MySQL is more suitable for Internet scenarios with relatively simple business logic and low data reliability requirements (such as google, facebook, alibaba)

Download of PostgreSQL under Windows

Download link: PostgreSQL download

  1. Click on the exe file to pop up
    insert image description here

  2. You can modify the installation path
    insert image description here

  3. Choose to install components, if you don’t understand, you can check all of them:
    insert image description here

  4. Set the data path of the database'
    insert image description here

  5. Set password for superuser
    insert image description here

6. Set the port number, you can directly use the default.
insert image description here
6. Click Next directly until the following figure, uncheck it;
insert image description here

  1. Open pgAdmin 4
    insert image description here
  2. Click Servers > Postgre SQL 10 on the left,
    insert image description here
    enter the password, and click OK

insert image description here
9. Open SQL Shell(psql)
![Insert picture description here](https://img-blog.csdnimg.cn/32f8f25a7f8547c7ad463abc4a24db5b.png
insert image description here

PostgreSQL remote access

  1. Open the data subdirectory of the postgresql installation directory
    insert image description here

  2. Modify the pg_hba.conf file: add a new line in the IPV4 section: host all all 0.0.0.0/0 md5
    insert image description here

  3. Control Panel –> System and Security –> Windows Firewall, close the firewall, restart the service;

  • In business development, most of them operate PostgreSQL through client connection tools, and there are still few ways to operate through the command line. I use navicat.
  • There may be many kinds of remote access connection problems, most of which can be solved by Baidu;

Basic usage of PostgreSQL

Log in

In business, we mostly use navicat to connect, and rarely use the command line to connect;

psql -U dbuser -d exampledb -h 127.0.0.1 -p 5432

database operation

#创建数据库
CREATE DATABASE mydb;

#查看所有数据库
\l

#切换当前数据库
\c mydb


#删除数据库
drop database <dbname>

Database table operations

table field type

  1. Integer
  • smallint : 2 bytes, small range integer, range -32768 to +32767
  • integer: 4 bytes, commonly used integers, ranging from -2147483648 to +2147483647
  • bigint: 8-byte large-range integer, ranging from -9223372036854775808 to +9223372036854775807
  • decimal : variable-length user-specified precision, 131072 digits before the decimal point; 16383 digits after the decimal point
  • numeric variable-length user-specified precision, 131072 digits before the decimal point; 16383 digits after the decimal point
  • double: 8 bytes variable precision, inexact 15 decimal digits precision

Generally, double is not used in business, it is better to use decimal to avoid the problem of precision error;

  1. character type
  • char(size), character(size): fixed-length character string, size specifies the number of characters to be stored, filled with spaces on the right
  • varchar(size), character varying(size): variable length string, size specifies the number of characters to be stored;
  • text: variable length string.
  1. time type
  • timestamp: date and time;
  • date: date, no time;
  • time: time;

There are mainly these types, as well as geometry, Boolean types, etc., and the above three are common;

table operation

In business, the operation of creating a table should be created through a visual client tool;


#创建表
CREATE TABLE test(id int,body varchar(100));

#在表中插入数据
insert into test(id,body) values(1,'hello,postgresql');

#查看当前数据库下所有表
\d

#查看表结构,相当于desc
\d test

Primary key related: PostgreSQL uses sequences to identify the self-growth of fields, and the data types are smallserial, serial, and bigserial. These properties are similar to the AUTO_INCREMENT properties supported by the MySQL database.

  • SMALLSERIAL: 2 bytes, range: 1 to 32767
  • SERIAL: 4 bytes, range: 1 to 2,147,483,647
  • BIGSERIAL: 8 bytes, range 1 to 922,337,2036,854,775,807
#创建表
CREATE TABLE COMPANY(
   ID  SERIAL PRIMARY KEY,
   NAME           TEXT      NOT NULL,
   AGE            INT       NOT NULL,
   ADDRESS        CHAR(50),
   SALARY         REAL
);

#插入数据
INSERT INTO COMPANY (NAME,AGE,ADDRESS,SALARY)
VALUES ( 'Paul', 32, 'California', 20000.00 );

INSERT INTO COMPANY (NAME,AGE,ADDRESS,SALARY)
VALUES ('Allen', 25, 'Texas', 15000.00 );
#查询SQL
SELECT * FROM COMPANY where id = 1;
# 更新SQL
UPDATE COMPANY SET  age = 33 where id = 1;

The syntax of PostgreSQL is basically similar to that of MySQL. In business development, curd is generally written, and operations such as table creation are more efficient through visual tools;

Schema

A PostgreSQL schema (SCHEMA) can be viewed as a collection of tables.
A schema can contain views, indexes, data types, functions and operators, etc.
The same object names can be used in different schemas without conflict, eg schema1 and myschema can both contain tables named mytable.
Advantages of usage patterns:
● Allows multiple users to use a database without interfering with each other.
● Organize database objects into logical groups for easier management.
● Objects of third-party applications can be placed in separate schemas so that they do not conflict with names of other objects.
Schemas are similar to directories at the operating system level, but schemas cannot be nested.

#创建schema: 
create schema myschema;

create table myschema.company(
   ID   INT              NOT NULL,
   NAME VARCHAR (20)     NOT NULL,
   AGE  INT              NOT NULL,
   ADDRESS  CHAR (25),
   SALARY   DECIMAL (18, 2),
   PRIMARY KEY (ID)
);

#删除schema: 
drop schema myschema;

#删除一个模式以及其中包含的所有对象:
DROP SCHEMA myschema CASCADE;

insert image description here
The structure is as above, after creating the schema, you can create tables with the same name in the two schemas, similar to the feeling of a library in a library;

index of the data table

unique index and normal index

CREATE UNIQUE INDEX "idx_dev_id_user_id" ON "myschema"."device" USING btree (
  "deviceid",
  "userid"
)

psql common index:

CREATE INDEX "id_dev_id" ON "myschema"."device" USING btree (
  "deviceid"
)

The bottom layer of the index uses the Btree structure, which is a sorted structure. The target result can be quickly found through tree traversal, which greatly reduces the number of IOs; if the
index is not used, the full table scan is performed;

The tree structure is shown in the figure:
insert image description here

hash index

Store data through the hash table structure. When storing data, hash the query conditions, get the hash code, and then get the target value from the hash table. The disadvantage is that only =, in queries are supported, and range queries are not supported;

CREATE INDEX "idx_name" ON "myschema"."person" USING hash (
  "name"
)

insert image description here
In business development, this index is basically not used. Many scenarios in the business require fuzzy search and range search, but hash index cannot support it;

Inverted index

  • Generalized Inverted Index, referred to as gin;
  • It handles values ​​of data types that are not atomic but composed of elements.
  • A GIN index consists of a B-tree of elements, with a B-tree or flat list of TIDs linked to the leaf rows of that B-tree.
  • Used in full-text search scenarios to solve the problem of low full-text search performance;
  • It can solve the problem of index failure like “%xxx%”;
  1. Add pg_trgm extension
CREATE EXTENSION pg_trgm;
  1. index the fields
CREATE INDEX "idx_addres" ON "myschema"."person" USING gin (
  "address"
)

forward index

The entire key is the index, and the value is the entire row of records;
for example, the search name is "zhangsan", and the value is "zhangsan" for the entire record;
the key of the positive index is "zhangsan", and the value saves the entire record;

Corresponding to the ID primary key index, ordinary index, unique index, is a positive index;
insert image description here

Inverted index

  • The inverted table is indexed with words or phrases as keywords. The record entries corresponding to keywords in the table record all documents in which this word or phrase appears. An entry is a word field, which records the ID and Where the character occurs in this document.

  • Since the number of documents corresponding to each word or word is changing dynamically, the establishment and maintenance of the inverted list are more complicated, but when querying, all the documents corresponding to the query keyword can be obtained at one time, so the efficiency is higher than that of the forward list. surface.

  • In full-text retrieval, fast response to retrieval is the most critical performance, and since indexing is performed in the background, although the efficiency is relatively low, it will not affect the efficiency of the entire search engine.

  • The structure diagram of the inverted table is as follows
    insert image description here

  • GIN (Generalized Inverted Index, general inverted index) is an index structure that stores a collection of pairs (key, posting list), where key is a key value, and posting list is a set of locations where the key has appeared. For example, in ('hello', '14:2 23:4'), it means that hello appeared at the two positions of 14:2 and 23:4, and these positions in PG are actually the tid of the tuple.

  • Each attribute in the table may be parsed into multiple key values ​​when indexing, so the tid of the same tuple may appear in the posting list of multiple keys.

  • Through this index structure, tuples containing specified keywords can be quickly found, so GIN index is especially suitable for supporting full-text search, and PG's GIN index module is also developed to support full-text search.

psql gist index

  • Gist (Generalized Search Tree), the general search tree. Like btree, it is also a balanced search tree
  • Btree is used for equivalent and range searches;
  • In some scenes of life, it is necessary to store multi-dimensional data, such as geographic location, spatial location, image data, etc., and it is often necessary to judge whether it is in a certain location, the data of a certain point, I judge the "inclusion" of the geographic location, then we can use gist indexed

scenes to be used

  1. Geometry type, supports location search, sorted by distance.
  2. Range type, supports location search.
  3. Spatial type (PostGIS), supports location search, sorted by distance.
    The scene has not been thought of for the time being;

easy to use

1. Create a test table:

create table company(id int, location point);    

insert image description here
2. Set index for location

CREATE INDEX "idx_location" ON "myschema"."company " USING gist (
  "location"
)
  1. Add random insertion of 100,000 pieces of data
insert into company select generate_series(1,100000), point(round((random()*1000)::numeric, 2), round((random()*1000)::numeric, 2));  
  1. Inquire
select * from company where circle '((100,100) 50)'  @> location;    

Find all the data within the range of 50 above and below the coordinates (100,100), the results are as follows,
insert image description here

Use explain to view the execution plan:

explain (analyze,verbose,timing,costs,buffers) select * from company where circle '((100,100) 50)'  @> location;

insert image description here

Check the paging execution plan again;

explain (analyze,verbose,timing,costs,buffers) select * from company where circle '((100,100) 50)'  @> location ORDER BY id  limit  10 OFFSET 11;

insert image description here
There are three types: Bitmap index scan, Bitmap heap scan, and sort;

Pagination search scene

In business development, the search often involves paging operations, but PostgreSQL and MySQL are not consistent. Instead of using limit xxx, xxx, limit xx offset xx is used; for example, a page of 10 data, to search the first page
:

select * from company where circle '((100,100) 50)'  @> location ORDER BY id  limit  10 OFFSET 1;

insert image description here
On the second page, the 11th item is worth 20 items

select * from company where circle '((100,100) 50)'  @> location ORDER BY id  limit  10 OFFSET 11;

insert image description here

user operation

#创建用户并设置密码
CREATE USER 'username' WITH PASSWORD 'password';
CREATE USER test WITH PASSWORD 'test';

#修改用户密码
$ ALTER USER 'username' WITH PASSWORD 'password';

#数据库授权,赋予指定账户指定数据库所有权限
$ GRANT ALL PRIVILEGES ON DATABASE 'dbname' TO 'username';
#将数据库 mydb 权限授权于 test
GRANT ALL PRIVILEGES ON DATABASE mydb TO test;
#但此时用户还是没有读写权限,需要继续授权表
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO xxx;
#注意,该sql语句必须在所要操作的数据库里执行

#移除指定账户指定数据库所有权限
REVOKE ALL PRIVILEGES ON DATABASE mydb from test

#删除用户
drop user test

# 查看用户
\du

The first setting in the pg_hba.conf configuration means: when a local user logs in through a unix socket, use peer authentication.

# "local" is for Unix domain socket connections only
local   all             all                                     peer
  • The peer is logged in as a user on the operating system where PostgreSQL is located.
    In the peer mode, the client must be on the same machine as PostgreSQL. As long as the current system user is the same as the user name to log in to PostgreSQL, you can log in.
    After just deploying PostgreSQL, after switching to the postgres user of the system, you can directly execute psql to enter PostgreSQL for this reason (the current system user is named postgre, and the user name in PostgreSQL is also postgre).

PostgreSQL role management

In PostgreSQL, there is no concept of distinguishing between users and roles. "CREATE USER" is an alias of "CREATE ROLE". These two commands are almost identical. The only difference is that the user created by the "CREATE USER" command has the LOGIN attribute by default. , and the user created by the "CREATE ROLE" command does not have the LOGIN attribute by default

postgres=# CREATE ROLE david;  //默认不带LOGIN属性
CREATE ROLE
postgres=# CREATE USER sandy;  //默认具有LOGIN属性
CREATE ROLE
postgres=# \du
                             List of roles
 Role name |                   Attributes                   | Member of 
-----------+------------------------------------------------+-----------
 david     | Cannot login                                   | {}
 postgres  | Superuser, Create role, Create DB, Replication | {}
 sandy     |                                                | {}

postgres=# 
postgres=# SELECT rolname from pg_roles ;
 rolname  
----------
 postgres
 david
 sandy
(3 rows)

postgres=# SELECT usename from pg_user;         //角色david 创建时没有分配login权限,所以没有创建用户
 usename  
----------
 postgres
 sandy
(2 rows)

postgres=#

Update permissions;

postgres=# ALTER ROLE bella WITH LOGIN;
ALTER ROLE
postgres=# \du
                             List of roles
 Role name |                   Attributes                   | Member of 
-----------+------------------------------------------------+-----------
 bella     | Create DB                                      | {}
 david     |                                                | {}
 postgres  | Superuser, Create role, Create DB, Replication | {}
 renee     | Create DB                                      | {}
 sandy     |                                                | {}

postgres=#

character attribute

  • login: Only roles with the LOGIN attribute can be used as the initial role name for database connections.
  • superuser: database superuser
  • createdb: create database permissions
  • createrole: Allows it to create or delete other normal user roles (except superusers)
  • password: It will only work when a password is required to be specified during login, such as md5 or password mode, which is related to the connection authentication method of the client
  • replication: A user attribute used when doing stream replication, usually set separately.

Common commands in command line mode

\password命令(设置密码)
\q命令(退出)
\h:查看SQL命令的解释,比如\h select。
\?:查看psql命令列表。
\l:列出所有数据库。
\c [database_name]:连接其他数据库。
\d:列出当前数据库的所有表格。
\d [table_name]:列出某一张表格的结构。
\du:列出所有用户。

Summarize

  1. PostgreSQL is more powerful than MySQL, and its syntax is close, so crud can be learned quickly;
  2. It also has unique indexes, common indexes, and hash indexes. In addition, there are new features of GIN and GIST indexes, and the business scenarios are more extensive;
  3. Oracle requires money, and PostgreSQL is free. In the future, in strict enterprise scenarios, Oracle will account for less and less, and will be gradually replaced by PostgreSQL;
  4. Mysql + PostgreSQL will be the trend in the future, and developers must understand these two pieces;
  5. There will be advanced content in the future, which needs continuous learning;

Guess you like

Origin blog.csdn.net/yaoyaochengxian/article/details/131975671