Big data company data analysis access process and SQL example

Effective data analysis first needs to obtain the required data from a huge database, which involves the skill of SQL data retrieval.

SQL usage process

As a structured query language, SQL can add, delete, modify and query relational databases. For data analysts, the most commonly used query operation is to extract qualified data from the database for calculation or visualization. So, what is the basic process of SQL data fetching?

  • First, determine the target table and field to be fetched, and the index to be calculated. This requires a clear understanding of business requirements, knowing which fields to fetch from which tables, and what to calculate.

  • Secondly, write SQL query statements, use select, from, where, group by, having, order by clauses, specify the field name, table name, condition, grouping, sorting to be queried, master SQL syntax, and use various functions flexibly .

  • Then, run the SQL query statement to get the data from the database, and be sure to strictly check the accuracy and integrity of the data. Connect to the database correctly, execute the query statement, and also pay attention to check whether the data is missing, abnormal or wrong.

  • Finally, export the query results to other tools or platforms to prepare for the next step of processing or display. Different scenarios require different tools, such as Excel, Power BI, and Tableau.

SQL specific code content

Specific business requirements determine how to write SQL code, but in general, it is divided into the following parts:

  • Create a table statement: create a database or table, specify the table name, field name, field type, primary key, index and other attributes.

  • Insert statement: Insert data into the table, specify the table name and field value to be inserted.

  • Query statement: query data from the table, specify the field name, table name, condition, sorting, grouping, etc. to be queried.

  • Update statement: modify the data in the table, specify the table name, field name, condition and new value to be modified.

  • Delete statement: delete the data in the table, specify the table name and conditions to be deleted.

A simple example of SQL for the different parts follows:

-- 建表语句
create table products (
  prod_id int primary key, -- 产品编号
  prod_name varchar(50) not null, -- 产品名称
  prod_price decimal(10,2) check (prod_price > 0), -- 产品价格
  prod_category varchar(20) -- 产品类别
);

-- 插入语句
insert into products values (1, 'iPhone 14', 6999.00, '手机');
insert into products values (2, 'iPad Pro', 4999.00, '平板');
insert into products values (3, 'MacBook Air', 7999.00, '笔记本');

-- 查询语句
select * from products; -- 查询所有产品信息
select prod_name, prod_price from products where prod_category = '手机'; -- 查询手机类别的产品名称和价格
select prod_category, avg(prod_price) as avg_price from products group by prod_category; -- 查询每个类别的产品平均价格

-- 更新语句
update products set prod_price = prod_price * 0.9 where prod_id = 1; -- 将产品编号为1的产品价格打九折

-- 删除语句
delete from products where prod_price < 5000; -- 删除价格低于5000的产品

SQL code example

Let's look at another simple example, assuming that we want to get information such as product name, price and category from a product table, and calculate the average price of products in each category.

First, we determine the target table and fields to be fetched as follows:

  • Target table: products

  • Target fields: prod_name (product name), prod_price (product price), prod_category (product category)

  • Target metric: prod_category_avg_price (average price of products per category)

Second, we write the SQL query statement as follows:

-- 查询语句
select prod_name, prod_price, prod_category, avg(prod_price) over (partition by prod_category) as prod_category_avg_price
from products;

in,

  • The select clause specifies the field name to be queried;

  • The from clause specifies the table name to be queried;

  • The avg function is used to calculate the average value;

  • The over clause is used to specify the partition and sorting method of the window function;

  • The partition by clause is used to group by product category;

  • The as clause is used to give the calculated field an alias.

Then, we run the SQL query to get the data in the database and check that the data is correct and complete. Suppose we get the following query results:

prod_name prod_price prod_category prod_category_avg_price
iPhone 14 6999.00 cell phone 6999.00
iPad Pro 4999.00 flat 4999.00
MacBook Air 7999.00 notebook 7999.00

We can see that the query results contain the fields and indicators we want, and the data is not missing or abnormal, so the data can be considered correct and complete.

Finally, we export the query results to Excel for further processing. The query results can be sorted, filtered, analyzed or charted to meet different needs and scenarios.

Guess you like

Origin blog.csdn.net/apkkkk/article/details/131051492