SQL column
Summary of basic knowledge of SQL database
Summary of advanced knowledge of SQL database
Excel is the most commonly used tool in data analysis. Excel can be used to complete data cleaning, preprocessing, and the most common data classification, data filtering, classification and summary, and data pivot operations, and these operations can be implemented with SQL. SQL can not only read data from the database, but also directly return the required results through different SQL function statements, which greatly improves the efficiency of their calculations in client applications.
1 Duplicate data processing to
find duplicate records
SELECT * FROM user
Where (nick_name,password) in
(
SELECT nick_name,password
FROM user
group by nick_name,password
having count(nick_name)>1
);
Find deduplication records
Find the record with the largest id
SELECT * FROM user
WHERE id in
(SELECT max(id) FROM user
group by nick_name,password
having count(nick_name)>1
);
Delete duplicate records
Only keep the record with the smallest id value
DELETE c1
FROM customer c1,customer c2
WHERE c1.cust_email=c2.cust_email
AND c1.id>c2.id;
DELETE FROM user Where (nick_name,password) in
(SELECT nick_name,password FROM
(SELECT nick_name,password FROM user
group by nick_name,password
having count(nick_name)>1) as tmp1
)
and id not in
(SELECT id FROM
(SELECT min(id) id FROM user
group by nick_name,password
having count(nick_name)>1) as tmp2
);
2 Missing value processing
Find missing value records
SELECT * FROM customer
WHERE cust_email IS NULL;
Update column to fill with empty values
UPDATE sale set city = "未知"
WHERE city IS NULL;
UPDATE orderitems set
price_new=IFNULL(price_new,5.74);
Query and fill the empty column
SELECT AVG(price_new) FROM orderitems;
SELECT IFNULL(price_new,5.74) AS bus_ifnull
FROM orderitems;
3 Calculated column
Update table to add calculated columns
ALTER TABLE orderitems ADD price_new DECIMAL(8,2) NOT NULL;
UPDATE orderitems set price_new= item_price*count;
Query calculated columns
SELECT item_price*count as sales FROM orderitems;
4 Sort
Sort by multiple columns
SELECT * FROM orderitems
ORDER BY price_new DESC,quantity;
Query the top few records
SELECT * FROM orderitems
ORDER BY price_new DESC LIMIT 5;
Query the 10th largest value
SELECT DISTINCT price_new
FROM orderitems
ORDER BY price_new DESC LIMIT 9,1;
Rank
Same ranks with the same value and consecutive ranks
SELECT prod_price,
(SELECT COUNT(DISTINCT prod_price)
FROM products
WHERE prod_price>=a.prod_price
) AS rank
FROM products AS a
ORDER BY rank ;
5 String processing
String replacement
UPDATE data1 SET city=REPLACE(city,'SH','shanghai');
SELECT city FROM data1;
Intercept by position string
String interception can be used to sort data into
MySQL string interception functions: left(), right(), substring(), substring_index()
SELECT left('example.com', 3);
Take from the 4th character position of the string until the end
SELECT substring('example.com', 4);
Start from the 4th character position of the string, only take 2 characters
SELECT substring('example.com', 4, 2);
Intercept string by keyword
Take all the characters before the first separator, the result is www
SELECT substring_index('www.google.com','.',1);
Take all the characters after the penultimate separator, the result is google.com;
SELECT substring_index('www.google.com','.',-2);
6 Screening
Use operators to achieve advanced filtering
Use operators such as AND OR IN NOT to achieve advanced filtering
SELECT prod_name,prod_price FROM Products
WHERE vend_id IN('DLL01','BRS01');
SELECT prod_name FROM Products WHERE NOT vend_id='DLL01';
Wildcard filtering
Common wildcards are% _ [] ^
SELECT * from customers WHERE country LIKE "CH%";
7 table join
SQL table connection can achieve a function similar to the Vlookup function in Excel
SELECT vend_id,prod_name,prod_price
FROM Vendors INNER JOIN Products
ON Vendors.vend_id=Products.vend_id;
SELECT prod_name,vend_name,prod_price,quantity
FROM OderItems,Products,Vendors
WHERE Products.vend_id=Vendors.vend_id
AND OrderItems.prod_id=Products.prod_id
AND order_num=20007;
Self-join uses the same table multiple times in a SELECT statement
SELECT c1.cust_od,c1.cust_name,c1.cust_contact
FROM Customers as c1,Customers as c2
WHERE c1.cust_name=c2.cust_name
AND c2.cust_contact='Jim Jones';
8 Pivot
Data grouping can realize the function of pivot table in Excel
Data grouping
group by is used for data grouping having used for filtering data after grouping
SELECT order_num,COUNT(*) as items
FROM OrderItems
GROUP BY order_num HAVING COUNT(*)>=3;
Cross table
Realized by CASE WHEN function
SELECT data1.city,
CASE WHEN colour = "A" THEN price END AS A,
CASE WHEN colour = "B" THEN price END AS B,
CASE WHEN colour = "C" THEN price END AS C,
CASE WHEN colour = "F" THEN price END AS F
FROM data1