Use SQL for data analysis like Excel

SQL column

Summary of basic knowledge of SQL database

Summary of advanced knowledge of SQL database

Excel is the most commonly used tool in data analysis. Excel can be used to complete data cleaning, preprocessing, and the most common data classification, data filtering, classification and summary, and data pivot operations, and these operations can be implemented with SQL. SQL can not only read data from the database, but also directly return the required results through different SQL function statements, which greatly improves the efficiency of their calculations in client applications.

1 Duplicate data processing to

find duplicate records


SELECT * FROM user 
Where (nick_name,password) in
(
SELECT nick_name,password 
FROM user 
group by nick_name,password 
having count(nick_name)>1
);

Find deduplication records

Find the record with the largest id

SELECT * FROM user 
WHERE id in
(SELECT max(id) FROM user
group by nick_name,password 
having count(nick_name)>1
);

Delete duplicate records

Only keep the record with the smallest id value

DELETE  c1
FROM  customer c1,customer c2
WHERE c1.cust_email=c2.cust_email
AND c1.id>c2.id;
DELETE FROM user Where (nick_name,password) in
(SELECT nick_name,password FROM
    (SELECT nick_name,password FROM user 
    group by nick_name,password 
    having count(nick_name)>1) as tmp1
)
and id not in
(SELECT id FROM
    (SELECT min(id) id FROM user 
     group by nick_name,password 
     having count(nick_name)>1) as tmp2
);

2 Missing value processing

Find missing value records


SELECT * FROM customer
WHERE cust_email IS NULL;

Update column to fill with empty values


UPDATE sale set city = "未知" 
WHERE city IS NULL;

UPDATE orderitems set 
price_new=IFNULL(price_new,5.74);

Query and fill the empty column

SELECT AVG(price_new) FROM orderitems;

SELECT IFNULL(price_new,5.74) AS bus_ifnull
FROM orderitems;

3 Calculated column

Update table to add calculated columns


ALTER TABLE orderitems ADD price_new DECIMAL(8,2) NOT NULL;

UPDATE orderitems set price_new= item_price*count;


Query calculated columns


SELECT item_price*count as sales FROM orderitems;

4 Sort

Sort by multiple columns


SELECT * FROM orderitems
ORDER BY price_new DESC,quantity;

Query the top few records

SELECT * FROM orderitems
ORDER BY price_new DESC LIMIT 5;

Query the 10th largest value


SELECT DISTINCT price_new
FROM orderitems
ORDER BY price_new DESC LIMIT 9,1;

Rank

Same ranks with the same value and consecutive ranks


SELECT prod_price,
(SELECT COUNT(DISTINCT prod_price)
FROM products
WHERE prod_price>=a.prod_price
) AS rank
FROM products AS a
ORDER BY rank ;

5 String processing

String replacement


UPDATE data1 SET city=REPLACE(city,'SH','shanghai');

SELECT city FROM data1;

Intercept by position string

String interception can be used to sort data into
MySQL string interception functions: left(), right(), substring(), substring_index()


SELECT left('example.com', 3);

Take from the 4th character position of the string until the end


SELECT substring('example.com', 4);

Start from the 4th character position of the string, only take 2 characters


SELECT substring('example.com', 4, 2);

Intercept string by keyword

Take all the characters before the first separator, the result is www


SELECT substring_index('www.google.com','.',1);

Take all the characters after the penultimate separator, the result is google.com;


SELECT substring_index('www.google.com','.',-2);

6 Screening

Use operators to achieve advanced filtering
Use operators such as AND OR IN NOT to achieve advanced filtering

SELECT prod_name,prod_price FROM Products
WHERE vend_id IN('DLL01','BRS01');
SELECT prod_name FROM Products WHERE NOT vend_id='DLL01';

Wildcard filtering

Common wildcards are% _ [] ^


SELECT * from customers WHERE country LIKE "CH%";

7 table join

SQL table connection can achieve a function similar to the Vlookup function in Excel


SELECT vend_id,prod_name,prod_price
FROM Vendors INNER JOIN Products
ON Vendors.vend_id=Products.vend_id;

SELECT prod_name,vend_name,prod_price,quantity
FROM OderItems,Products,Vendors
WHERE Products.vend_id=Vendors.vend_id
AND OrderItems.prod_id=Products.prod_id
AND order_num=20007;

Self-join uses the same table multiple times in a SELECT statement


SELECT c1.cust_od,c1.cust_name,c1.cust_contact
FROM Customers as c1,Customers as c2
WHERE c1.cust_name=c2.cust_name
AND c2.cust_contact='Jim Jones';

8 Pivot

Data grouping can realize the function of pivot table in Excel

Data grouping
group by is used for data grouping having used for filtering data after grouping


SELECT order_num,COUNT(*) as items
FROM OrderItems
GROUP BY order_num HAVING COUNT(*)>=3;

Cross table

Realized by CASE WHEN function


SELECT data1.city,
CASE WHEN colour = "A" THEN price END AS A,
CASE WHEN colour = "B" THEN price END AS B,
CASE WHEN colour = "C" THEN price END AS C,
CASE WHEN colour = "F" THEN price END AS F
FROM data1

Guess you like

Origin blog.51cto.com/15057820/2654650