[SQL must know and know] - Lesson 10 grouping data

Table of contents

data packet

create group

        Rules for using GROUP BY:

        ALL clause

filter group

        Tip: HAVING supports all WHERE operators

       Difference between HAVING and WHERE

        Using HAVING and WHERE

grouping and sorting

        Don't forget to ORDER BY

SELECT clause order


data packet

        Use grouping to divide data into logical groups and perform aggregation calculations on each group.


create group

        Grouping is established using the GROUP BY clause of the SELECT statement. The best way to understand grouping is to look at an example:

SELECT vend_id, COUNT(*) AS num_prods FROM Products GROUP BY vend_id;

        

        Because of the use of GROUP BY, it is not necessary to specify each group to be calculated and valued. The system will do it automatically. The GROUP BY clause instructs the DBMS to group data and then aggregate each group rather than the entire result set.


        Rules for using GROUP BY:

  1. The GROUP BY clause can contain any number of columns, so groups can be nested for more granular data grouping.
  2. If there are nested groupings in the GROUP BY clause, data will be aggregated on the last specified grouping. In other words, all columns specified are calculated together when grouping is created (so data cannot be retrieved from individual columns).
  3. Each column listed in the GROUP BY clause must be a retrieval column or a valid expression (but not an aggregate function). If an expression is used in the SELECT, the same expression must be specified in the GROUP BY clause. Aliases cannot be used.
  4. Most SQL implementations do not allow GROUP BY columns with variable-length data types (such as text or memo fields).
  5. Every column in the SELECT statement must be given in the GROUP BY clause, except for aggregate calculation statements.
  6. If the grouping column contains rows with NULL values, NULL will be returned as a group. If there are multiple rows of NULL values ​​in the column, they will be grouped together.
  7. The GROUP BY clause must appear after the WHERE clause and before the ORDER BY clause.

        ALL clause

        Some SQL implementations, such as Microsoft SQL Server, support an optional ALL clause in GROUP BY. This clause can be used to return all groups, even those with no matching rows (in which case the aggregate will return NULL). Whether a specific DBMS supports ALL, please refer to the corresponding document.


filter group

        We've already seen the WHERE clause in action (mentioned in Lesson 4). However, WHERE won't do the job in this example because WHERE filters specify rows rather than groups. In fact, WHERE has no concept of grouping.

        So, what to use instead of WHERE? SQL provides another clause for this purpose, the HAVING clause. HAVING is very similar to WHERE. In fact, all types of WHERE clauses learned so far can be replaced by HAVING. The only difference is that WHERE filters rows, while HAVING filters groups.


        Tip: HAVING supports all WHERE operators

        In Lessons 4 and 5, we learned about the conditions of the WHERE clause (including wildcard conditions and clauses with multiple operators). All the techniques and options learned about WHERE apply to HAVING. Their syntax is the same, only the keywords are different.

SELECT cust_id, COUNT(*) AS orders FROM Orders GROUP BY cust_id HAVING COUNT(*) >= 2;

        In some cases, we need to meet certain conditions, and at the same time, we need to obtain data with a number > 1, or we need data with a number ≥ 2. In this case, it is appropriate to use having. (HAVING COUNT(*) >= 2;)


       Difference between HAVING and WHERE

        Here is another way to understand, WHERE filters before data grouping, and HAVING filters after data grouping. This is an important distinction, rows excluded by WHERE are not included in the grouping. This may change the computed values, affecting groupings filtered out based on those values ​​in the HAVING clause.

        So, is there any need to use both WHERE and HAVING clauses in one statement? In fact, there are. Suppose you want to further filter the above statement so that it returns customers who have had two or more orders within the past 12 months. To do this, add a WHERE clause to filter out orders placed within the past 12 months, and then add a HAVING clause to filter out groups with more than two orders. To understand better, look at the following example, which lists suppliers with more than two products whose price is greater than or equal to 4:

SELECT vend_id, COUNT(*) AS num_prods
FROM Products
WHERE prod_price >= 4
GROUP BY vend_id
HAVING COUNT(*) >= 2;


        In this statement, the first line is a basic SELECT statement using an aggregate function, much like the previous example. The WHERE clause filters all rows with a prod_price of at least 4, then groups the data by vendor_id, and the HAVING clause filters the groups with a count of 2 or more. If there is no WHERE clause, one more row will be retrieved (supplier DLL01, selling 4 products, all prices are below 4)

SELECT vend_id, COUNT(*) AS num_prods
FROM Products
GROUP BY vend_id
HAVING COUNT(*) >= 2;

 


        Using HAVING and WHERE

        HAVING is very similar to WHERE, and most DBMSs treat them equally if you don't specify GROUP BY. However, you need to be able to distinguish this yourself. HAVING should be used in conjunction with the GROUP BY clause, while the WHERE clause is used for standard row-level filtering. 


grouping and sorting

        It's important to understand that GROUP BY and ORDER BY often do the same thing, but they are very different. The table below summarizes the differences between them.

        The first difference listed in the table is extremely important. We often find that the data grouped with GROUP BY is indeed output in group order. But that's not always the case, it's not required by the SQL specification. Also, even if a particular DBMS always sorts data by a given GROUP BY clause, users may request that they be sorted in a different order. Just because you group the data in a certain way (to get a specific grouped aggregate value), doesn't mean you need to sort the output the same way. An explicit ORDER BY clause should be provided even if it is equivalent to a GROUP BY clause.

        To put it simply, GROUP BY will group the data, but the output results may not be output in ascending/descending order. To achieve ascending/descending order, please use the Order by clause.


        Don't forget to ORDER BY

        Generally, when using the GROUP BY clause, the ORDER BY clause should also be given. This is the only way to guarantee that the data is sorted correctly. Never rely solely on GROUP BY to sort data.

        To illustrate the use of GROUP BY and ORDER BY, let's look at an example. The following SELECT statement is similar to the previous examples. It retrieves the order number and number of items ordered for three or more items:

SELECT order_num, COUNT(*) AS items
FROM OrderItems
GROUP BY order_num
HAVING COUNT(*) >= 3;

        To sort the output by the number of items ordered, you need to add an ORDER BY clause, as follows: 

SELECT order_num, COUNT(*) AS items
FROM OrderItems
GROUP BY order_num
HAVING COUNT(*) >= 3
ORDER BY items, order_num;

order_num items
---------         -----
20009         3
20007         5
20008         5

        In this example, the GROUP BY clause is used to group the data by order number (order_num column) so that the COUNT(*) function returns the number of items in each order. The HAVING clause filters the data so that only orders containing three or more items are returned. Finally, sort the output with an ORDER BY clause.


SELECT clause order

        Let's review the order of the clauses in the SELECT statement. The following table lists the clauses learned so far, in the order they must be used in the SELECT statement.

Guess you like

Origin blog.csdn.net/qq_57163366/article/details/129987595