Postgresql database introduction 2-use

2.1. Introduction

This chapter is an overview of how to perform simple operations using SQL. The purpose of this tutorial is just to give you an introduction, not a complete SQL tutorial. There are many books on SQL, including Understanding the New SQL and A Guide to the SQL Standard. What you need to know is that some PostgreSQL language features are extensions to the standard.

In the following examples, we assume that you have created a database named mydb, as described in the previous chapter, and have started psql.

The examples in this manual can also be found in the directory src/tutorial/ in the PostgreSQL source code distribution (the binary distribution of PostgreSQL may not be able to compile these files). To use these files, first enter the directory and then run make:

   $ cd ..../src/tutorial
   $ make

This creates a script and compiles a C file containing user-defined functions and types. To start this tutorial, proceed as follows:

   $ cd ..../src/tutorial
   $ psql -s mydb
   ...
   mydb=> \i basics.sql

The \i command reads commands from the specified file. The -s option of psql puts you in single-step mode, which will pause before sending each statement to the server. The commands used in this section are in the file basics.sql

2.2. Concept

PostgreSQL is a relational database management system (RDBMS). This means that it is a system for managing data that is stored in relational form. Relationship is actually a mathematical term for tables. Today, the concept of storing data in tables has become an inherent common sense, but there are other methods for organizing databases. Files and directories on Unix-like operating systems form an example of a hierarchical database. A more modern development is the object-oriented database.

Each table is a set of named rows, and each row in a given table consists of a set of identical named fields. And each field is a specific data type. Although the position of each field in each row is fixed, it is important to remember that SQL does not make any guarantees about the order of rows in the table (although their display can be explicitly sorted).

Tables form a database, and a series of databases managed by a certain PostgreSQL server form a database cluster.

2.3. Create a new table

You can create a table by declaring the name of the table and all the field names and their types:

   CREATE TABLE weather (
       city            varchar(80),
       temp_lo         int,           -- low temperature
       temp_hi         int,           -- high temperature
       prcp            real,          -- precipitation
       date            date
   );

You can enter these codes into psql along with a newline character, and it can recognize the command until the semicolon ends.

You can freely use whitespace (that is, spaces, tabs, and newlines) in SQL commands. This means you can type commands in a different alignment than above, or even write all the code on one line. Two dashes ("--") introduce comments. Anything following it up to the end of the line will be ignored. SQL is not sensitive to the case of keywords and identifiers, unless the identifier is surrounded by double quotation marks to retain their case attributes (not the case above).

varchar(80) declares a data type that can store any string of up to 80 characters. int is a normal integer type. real is a type used to store single-precision floating-point numbers. The date type should be self-explanatory. (Yes, the field name of type date is also date. This may be more convenient, or it may be confusing—see it for yourself.)

PostgresSQL supports standard SQL types int, smallint, real, double precision, char(N), varchar(N), date, time, timestamp and interval, as well as other common types and rich geometric types. PostgreSQL can customize any number of user-defined data types. Therefore, the type name is not a grammatical keyword, except for special cases required by the SQL standard.

The second example will save the names of cities and their associated geographic locations:

   CREATE TABLE cities (
       name            varchar(80),
       location        point
   );

The point data type is an example of a data type unique to PostgreSQL.

Finally, we will also mention that if you no longer need a table, or you want to create a different table, then you can delete it with the following command:

   DROP TABLE tablename;

2.4. Add rows to the table

The following INSERT command is used to add rows to the table:

   INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');

It should be noted that all data types use a fairly clear input format. Constants that are not simple numeric values ​​must be surrounded by single quotes ('), as in the example. The date type is actually quite flexible to the acceptable format, but in this tutorial, we will stick to a clear display format here.

The point type requires a coordinate pair as input, as follows:

   INSERT INTO cities VALUES ('San Francisco', '(-194.0, 53.0)');

The syntax used so far requires you to remember the order of the fields. An optional syntax allows you to explicitly list fields:

   INSERT INTO weather (city, temp_lo, temp_hi, prcp, date)
       VALUES ('San Francisco', 43, 57, 0.0, '1994-11-29');

If you need, you can list the fields in another order or ignore some fields. For example, we don’t know the amount of precipitation:

       INSERT INTO weather (date, city, temp_hi, temp_lo)
           VALUES ('1994-11-29', 'Hayward', 54, 37);

Many developers believe that it is better to list the fields explicitly than to rely on the implicit order.

Please enter all the commands shown above so that you have data available in the sections that follow.

You can also load large amounts of data from text files using the COPY command. This is usually faster, because the COPY command is optimized for this type of application, but it is less flexible than INSERT. such as:

   COPY weather FROM '/home/user/weather.txt';

Here the file name of the source file must be accessible by the back-end server, not the client, because the back-end server reads the file directly. You can read more about the COPY command in the COPY section.

2.5. Query a table

Retrieving data from a table is actually querying this table. The SQL SELECT command is used for this purpose. The statement is divided into a selection list (the part that lists the fields to be returned), a table list (the part that lists the table from which the data is retrieved), and optional conditions (the part that declares any restrictions). For example, to retrieve all rows of the weather table, type:

   SELECT * FROM weather;[1]

Here * is the abbreviation of "all columns". The same result can be obtained by the following query:

   SELECT city, temp_lo, temp_hi, prcp, date FROM weather;

And the output should be:

        city      | temp_lo | temp_hi | prcp |    date
   ---------------+---------+---------+------+------------
    San Francisco |      46 |      50 | 0.25 | 1994-11-27
    San Francisco |      43 |      57 |    0 | 1994-11-29
    Hayward       |      37 |      54 |      | 1994-11-29
   (3 rows)

You can write arbitrary expressions in the select list, not just column references. For example, you can:

   SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather;

This should give:

        city      | temp_avg |    date
   ---------------+----------+------------
    San Francisco |       48 | 1994-11-27
    San Francisco |       50 | 1994-11-29
    Hayward       |       45 | 1994-11-29
   (3 rows)

Please note how the AS clause here renames the output column. (The AS clause is optional.)

A query can be "modified" with the WHERE clause to declare where the row is needed. The WHERE clause contains a Boolean (truth value) expression, and only those rows where the Boolean expression value is true will be returned. Commonly used Boolean operators (AND, OR, and NOT) are allowed in conditions. For example, the following query retrieves the weather in San Francisco on a rainy day:

   SELECT * FROM weather
       WHERE city = 'San Francisco' AND prcp > 0.0;

result:

        city      | temp_lo | temp_hi | prcp |    date
   ---------------+---------+---------+------+------------
    San Francisco |      46 |      50 | 0.25 | 1994-11-27
   (1 row)

You can require the returned queries to be sorted:

   SELECT * FROM weather
       ORDER BY city;
        city      | temp_lo | temp_hi | prcp |    date
   ---------------+---------+---------+------+------------
    Hayward       |      37 |      54 |      | 1994-11-29
    San Francisco |      43 |      57 |    0 | 1994-11-29
    San Francisco |      46 |      50 | 0.25 | 1994-11-27

In this example, the order of sorting is not absolutely clear, so you may see random sorting of the row data for San Francisco. But if you use the following statement, you will always get the above result

   SELECT * FROM weather
       ORDER BY city, temp_lo;

You can use the following command to delete duplicate rows in the query result:

   SELECT DISTINCT city
       FROM weather;
        city
   ---------------
    Hayward
    San Francisco
    (2 rows)

Once again, the order of the result rows can be changed. You can combine DISTINCT and ORDER BY to get consistent results: [2]

   SELECT DISTINCT city
       FROM weather
       ORDER BY city;

[1] Although SELECT * is useful for temporary queries, we generally think that this is a bad style in production code, because adding a field to the table changes the result.

[2] In some database systems, including older versions of PostgreSQL, the execution of DISTINCT will sort the rows, so ORDER BY is redundant. But this is not a requirement of the SQL standard, and the current PostgreSQL does not guarantee that DISTINCT will cause the data rows to be sorted.

2.6. Connection between tables

So far, our query has only accessed one table at a time. A query can access multiple tables at once, or access a table in some way while still processing multiple rows of data in the table. We call a query that simultaneously accesses multiple rows of data in the same or different tables as a join query. For example, if you want to list all weather records and the coordinates of the cities related to these records. To achieve this goal, we need to compare the city field of each row in the weather table with the name field of all rows in the cities table, and select those rows that match these values.

Note; this is just a conceptual model. This connection is usually performed in a more efficient way than actually comparing each possible row pair, but these are invisible to the user.

This task can be achieved with the following query:

   SELECT *
       FROM weather, cities
       WHERE city = name;
         city      | temp_lo | temp_hi | prcp |    date    |     name      | location
   ---------------+---------+---------+------+------------+---------------+-----------
    San Francisco |      46 |      50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
    San Francisco |      43 |      57 |    0 | 1994-11-29 | San Francisco | (-194,53)
   (2 rows)

Observe two aspects of the result set:

There is no result row for the city Hayward. This is because there is no Hayward matching row in the cities table, so the connection ignores the unmatched row in the weather table. We will see how to fix this problem later.

Two fields contain the city name. This is correct because the fields of the weather and cities tables are joined together. However, we actually don't want these, so you will probably want to explicitly list the output fields instead of using *:

Exercise: Look at the semantics of omitting the WHERE clause.

Because each field has a different name, the analyzer will automatically find out which table they belong to. If there are duplicate field names in the two tables, then you need to use qualified field names to show which one you want The query is as follows:

   SELECT weather.city, weather.temp_lo, weather.temp_hi,
          weather.prcp, weather.date, cities.location
       FROM weather, cities
       WHERE cities.name = weather.city;

It is widely regarded as a good style to limit all field names in a join query, so that even if a duplicate column name is added to one of the tables later, the query will not fail.

So far, this type of join query can also be written in the following form:

   SELECT *
       FROM weather INNER JOIN cities ON (weather.city = cities.name);

This grammar is not as commonly used as the one above, we write it here to make it easier for you to understand the topics that follow.

Now we will see how to retrieve Hayward records. What we want the query to do is scan the weather table and find the matching rows in the cities table for each row. If we do not find a matching row, then we need some "empty values" to replace the fields in the cities table. This type of query is called an outer join. (The connections we saw before are all internal connections.) Such a command looks like this:

   SELECT *
       FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
        city      | temp_lo | temp_hi | prcp |    date    |     name      | location
   ---------------+---------+---------+------+------------+---------------+-----------
    Hayward       |      37 |      54 |      | 1994-11-29 |               |
    San Francisco |      46 |      50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
    San Francisco |      43 |      57 |    0 | 1994-11-29 | San Francisco | (-194,53)
   (3 rows)

This query is a left outer join, because the rows in the table on the left hand side of the join operator must appear at least once in the output, and the rows on the right hand side will only output those corresponding to the left hand row Matching rows. If the output row of the left-hand table does not correspond to the row of the matching right-hand table, then the field in the right-hand row will be filled with NULL.

Practice; there are right connections and full connections. Try to find out what they do.

We can also connect a table to ourselves. This is called self-connection. For example, suppose we want to find weather records that are within the temperature range of other weather records. In this way, we need to compare the temp_lo and temp_hi fields of each row in the weather table with the temp_lo and temp_hi fields of other rows in the weather table. We can achieve this goal with the following query:

   SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high,
       W2.city, W2.temp_lo AS low, W2.temp_hi AS high
       FROM weather W1, weather W2
       WHERE W1.temp_lo < W2.temp_lo
       AND W1.temp_hi > W2.temp_hi;
        city      | low | high |     city      | low | high
   ---------------+-----+------+---------------+-----+------
    San Francisco |  43 |   57 | San Francisco |  46 |   50
    Hayward       |  37 |   54 | San Francisco |  46 |   50
   (2 rows)

Here we relabel the weather table as W1 and W2 to distinguish the left-hand and right-hand sides of the connection. You can also use such aliases to save some keystrokes in other queries, such as:

   SELECT *
       FROM weather w, cities c
       WHERE w.city = c.name;

You will often come across such abbreviations in the future.

2.7. Aggregate functions

Like most other relational database products, PostgreSQL supports aggregate functions. An aggregate function calculates a result from multiple input lines. For example, we have aggregate functions that calculate count (number), sum (sum), avg (mean), max (maximum) and min (minimum) on a set of rows.

For example, we can use the following statement to find the highest temperature in all records

   SELECT max(temp_lo) FROM weather;
    max
   -----
     46
   (1 row)

If we want to know in which city the reading occurred, we can use

SELECT city FROM weather WHERE temp_lo = max(temp_lo); WRONG

But this query cannot be run because the aggregate function max cannot be used in the WHERE clause. (This restriction exists because the WHERE clause determines which rows can enter the aggregation phase; therefore, it must be calculated before the aggregation function is executed.) However, we can usually use other methods to achieve our purpose; here we can use subqueries:

   SELECT city FROM weather
       WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
        city
   ---------------
    San Francisco
   (1 row)

This is OK, because the subquery is an independent calculation, and it calculates its own aggregation independently of the outer query.

Aggregation is also commonly used in GROUP BY clauses. For example, we can get the highest value of low temperature in each city.

   SELECT city, max(temp_lo)
       FROM weather
       GROUP BY city;
        city      | max
   ---------------+-----
    Hayward       |  37
    San Francisco |  46
   (2 rows)

This gives us an output for each city. Each aggregation result is calculated on the row that matches the city. We can filter these groups with HAVING:

    SELECT city, max(temp_lo)
       FROM weather
       GROUP BY city
       HAVING max(temp_lo) < 40;
     city   | max
   ---------+-----
    Hayward |  37
   (1 row)

This gives only those cities where the temp_lo value used to have a temperature below 40 degrees. Finally, if we only care about cities whose names start with "S", we can use

   SELECT city, max(temp_lo)
       FROM weather
       WHERE city LIKE 'S%'(1)
       GROUP BY city
       HAVING max(temp_lo) < 40;

(1) LIKE does pattern matching, as explained in Section 9.7.

It is very important for us to understand the relationship between aggregation and SQL WHERE and HAVING clauses. The basic difference between WHERE and HAVING is as follows: WHERE selects input rows before grouping and aggregation calculations (hence, it controls which rows enter the aggregation calculation), while HAVING selects grouped rows after grouping and aggregation. Therefore, the WHERE clause cannot contain aggregate functions; it is meaningless to try to use aggregate functions to determine which rows are input to aggregate operations. In contrast, the HAVING clause always contains aggregate functions. (Strictly speaking, you can write HAVING clauses that do not use aggregation, but this is rarely useful. The same conditions can be used more effectively in the WHERE phase.)

In the previous example, we can apply the city name restriction in WHERE because it does not require aggregation. This is more efficient than increasing the limit in HAVING, because we avoid grouping and aggregation calculations for rows that fail the WHERE check.

2.8. Update

You can update existing rows with the UPDATE command. Suppose you find that all the temperature counts on November 28 are two degrees lower, then you can update the data in the following way:

   UPDATE weather
       SET temp_hi = temp_hi - 2,  temp_lo = temp_lo - 2
       WHERE date > '1994-11-28';

Look at the new state of the data:

   SELECT * FROM weather;
        city      | temp_lo | temp_hi | prcp |    date
   ---------------+---------+---------+------+------------
    San Francisco |      46 |      50 | 0.25 | 1994-11-27
    San Francisco |      41 |      55 |    0 | 1994-11-29
    Hayward       |      35 |      52 |      | 1994-11-29
   (3 rows)

2.9. Delete

Data rows can be deleted from the table with the DELETE command. Assuming you are no longer interested in Hayward's weather, you can delete those rows from the table in the following way:

   DELETE FROM weather WHERE city = 'Hayward';所有属于Hayward的天气记录都将被删除。 
   SELECT * FROM weather;
        city      | temp_lo | temp_hi | prcp |    date
   ---------------+---------+---------+------+------------
    San Francisco |      46 |      50 | 0.25 | 1994-11-27
    San Francisco |      41 |      55 |    0 | 1994-11-29
   (2 rows)

We must be careful when we use statements of the following form

   DELETE FROM tablename;

If there are no conditions, DELETE will delete all rows from the specified table and clear it. The system will not ask you for confirmation before doing this!

Guess you like

Origin blog.csdn.net/qq_37061368/article/details/112978506