Quick Start PieCloudDB Database: Data Example Demonstration

The new generation of cloud-native virtual data warehouse PieCloudDB "Cloud on Cloud" (Cloud on Cloud) was released on March 14, 2023. This blog will start with importing data, combined with examples such as virtual e-commerce sales data, and demonstrate the functions of query calculation and query history in detail, and guide you to quickly understand and get started with the cloud version of PieCloudDB. For detailed video explanation, please refer to Quick Start PieCloudDB video .

The content of this article is roughly divided into the following five parts:

  • Create a virtual data warehouse
  • Create a new folder and SQL file
  • Simple example: create database and table
  • Complex example: data upload -- virtual e-commerce sales data
  • Query Evaluation: Query History Feature Application

The first step of digital calculation: virtual data warehouse

Log in to the cloud version of PieCloudDB (app.pieclouddb.com), enter the main interface, and click "Virtual Data Warehouse" on the left menu bar.

Enter the "Virtual Data Warehouse" interface, click "New Virtual Data Warehouse" in the upper right corner to create a new virtual data warehouse.

Fill in the virtual data warehouse name, number of nodes, node size and remarks. After completion, click "Confirm" to activate the virtual data warehouse.

Wait for the status of the virtual data warehouse to be updated from starting to running, and then use the virtual data warehouse to execute SQL tasks.

Create a new folder and SQL file

After the virtual data warehouse is created and running, we click "Data Insight" to enter the "Data Insight" interface. Click the folder icon in the figure to create a folder.

Fill in the folder name and click OK.

Click the file icon as shown in the figure to create a new SQL file.

Click the "···" button on the right side of the file name in the file column to rename, move, delete or export the file.

Click "Rename" to rename the file to "demo_query", and click "OK" to complete the renaming.

After the SQL file is created, query statements can be written in the file to execute SQL tasks. PieCloudDB will automatically save the updated SQL file.

Simple example: create database and table

After ensuring that there is an available virtual warehouse, open a SQL file (here we take the previously created file demo_query as an example), and enter the following query statement.

create database testdb;

Before running the above query, select the corresponding database and virtual data warehouse to execute the SQL task. Here we choose the initial database "openpie" and select an available virtual data warehouse "virtual data warehouse 1".

Select the statement and click Execute to execute the query statement.

The result is as follows, the database has been created successfully.

If you want to create a new table in the newly created database, switch the database for executing the query to the newly created database "testdb" as shown in the figure.

Execute the following statement to create a data table that stores movie data

create table test_table (
ID char(10),
Name char(50),
Length int,
Date char(10),
Type char(20)
);

Similar to the previous example of creating a database, select and execute this statement in PieCloudDB.

After the data table is created, run the following SQL statement to add two new records in the table.

insert into test_table VALUES
('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
('HG120', 'The Dinner Game', 140, '1985-02-10', 'Comedy');

Run the following select statement to view the newly added records in the data table.

select * from test_table;

The result in PieCloudDB is shown in the figure.

Complex example: data upload -- virtual e-commerce sales data

After a preliminary understanding of PieCloudDB's "Data Insight" function, we use the "Data Integration" function on this basis to upload larger data files and analyze the data. Here we upload the virtual e-commerce sales data structure roughly as follows.

First, come to "Data Insight", run the following SQL statement to create the database required by the data.

create database 线上销售数据;

Similar to the previous simple example, switch the database to execute the query to "Online Sales Data". Run the following statement to create three schemas in "Online Sales Data".

create schema 服装销售数据;
create schema 食品销售数据;
create schema 顾客数据;

With the above schemas, we can run the following statements to create corresponding tables on each schema.

"Food sales data" schema - food related data data:

-- 食品产品数据
create table 食品销售数据.食品产品数据 (
  产品编号 VARCHAR(10) NOT NULL,
  原料 VARCHAR(5),
  类型 VARCHAR(5),
  价格 FLOAT,
  库存 INT,
  产品图片 TEXT
);

-- 2020年至2023年食品交易数据
create table "食品销售数据".交易数据_2020_2023 (
  交易编号 VARCHAR(10) NOT NULL,
  顾客序号 VARCHAR(10) NOT NULL,
  产品编号 VARCHAR(10) NOT NULL,
  交易日期 VARCHAR(10),
  交易时间 TIME,
  件数 INT,
  平台 VARCHAR(5)
);

"Clothing sales data" schema - clothing related data data:

-- 服装产品数据
create table 服装销售数据.服装产品数据 (
  产品编号 VARCHAR(10) NOT NULL,
  颜色 VARCHAR(5),
  类型 VARCHAR(5),
  价格 FLOAT,
  库存 INT,
  产品图片 TEXT
);

-- 2020年至2023年服装交易数据
create table "服装销售数据".交易数据_2020_2023 (
  交易编号 VARCHAR(10) NOT NULL,
  顾客序号 VARCHAR(10) NOT NULL,
  产品编号 VARCHAR(10) NOT NULL,
  交易日期 VARCHAR(10),
  交易时间 TIME,
  件数 INT,
  平台 VARCHAR(5)
);

"Customer Data" schema - user-related information:

-- 2020至2023年顾客数据
create table 顾客数据.顾客数据_2020_2023 (
  顾客序号 VARCHAR(10) NOT NULL,
  顾客姓名 VARCHAR(5),
  生日 DATE,
  注册日期 DATE,
  手机号 VARCHAR(11),
  省份 VARCHAR(10),
  城市 VARCHAR(10),
  地区 VARCHAR(10),
  地址 VARCHAR(100)
);

After creating the required tables, upload the data files to the corresponding tables one by one. Click "Data Integration" on the left menu bar, select "Import Data", enter the interface, and then select Import Data in the upper right corner to start importing data.

After entering the "Import Data" interface, follow the steps shown in the figure to upload the data file to the corresponding data table.

After importing the file, you can click the icon with the right eye on the left side of the file name to preview the file, or click the icon with the gear to the left to modify the upload options of the file. Click "Start" on the right to upload a single file to the database.

PieCloudDB can also upload multiple copies of data to the same data table. As shown, the clothing transaction data consists of multiple data files.

In the case of multiple files, the "Advanced Options" at the bottom can adjust the global file upload options. If individual files need to set upload options individually, as mentioned above, click the setting button to the left of the corresponding file name to set individually. Due to the large number of files, click "Start All" here to upload multiple files in sequence.

Once the data is loaded into the individual tables, we can perform some federated queries for data analysis.

The following SQL statement can create two views, one based on food transaction information and the other based on clothing transaction information, so that we can quickly query the corresponding products and the information of the customers who purchased them according to the characteristics of the transactions.

-- 基于服装交易的视图
create view "服装销售数据"."交易数据全部信息_2020_2023" as (
  select "交易编号",a."顾客序号", a."产品编号", "交易日期", "交易时间", "件数", "平台", "颜色", "类型", "价格", "库存", "产品图片", 
    "顾客姓名",c."生日","注册日期","省份","城市","地区","地址"
    from "服装销售数据"."交易数据_2020_2023" as a 
    left join "服装销售数据"."服装产品数据" as b 
    on a."产品编号" = b."产品编号" 
    left join "顾客数据"."顾客数据_2020_2023" as c
    on a."顾客序号" = c."顾客序号"
    order by a."交易日期" desc
);

-- 基于食品交易的视图
create view "食品销售数据"."交易数据全部信息_2020_2023" as (
  select "交易编号",a."顾客序号", a."产品编号", "交易日期", "交易时间", "件数", "平台", "原料", "类型", "价格", "库存", "产品图片", 
    "顾客姓名",c."生日","注册日期","省份","城市","地区","地址"
    from "食品销售数据"."交易数据_2020_2023" as a 
    left join "食品销售数据"."食品产品数据" as b 
    on a."产品编号" = b."产品编号" 
    left join "顾客数据"."顾客数据_2020_2023" as c
    on a."顾客序号" = c."顾客序号"
    order by a."交易日期" desc
);

According to these two views, we can easily query the sales by city during the Double Twelve period in 2022, and sort according to the sales from high to low.

select a."省份", a."城市", sum(a."总价") as "销售额" 
  from (
  select "交易编号", "顾客序号", "产品编号", "交易日期", "件数"*"价格" as "总价", "省份", "城市"
  	from "服装销售数据"."交易数据全部信息_2020_2023" 
  union 
  select "交易编号", "顾客序号", "产品编号", "交易日期", "件数"*"价格" as "总价", "省份", "城市" 
    from "食品销售数据"."交易数据全部信息_2020_2023"
  where "交易日期" = '2022-12-12') as a
group by "省份", "城市"
order by "销售额" desc;

The results of executing this query are as follows:

It can be seen that the top 6 sales in the data are all municipalities directly under the central government, and among the top ten, the cities north of the Yangtze River are mostly. (The data is virtual data, city information is randomly matched with virtual customers)

We can also use the window equation to find out the top 10% of customers in each sector of clothing and food based on the sales and customer information in the data table.

with sale_by_customer as (
  (select "顾客序号", sum("件数"*"价格") as "销售额", cast('服装' as varchar(5)) as "销售种类" 
    from "服装销售数据"."交易数据_2020_2023" as a
  left join "服装销售数据"."服装产品数据" as b 
    on a."产品编号" = b."产品编号" 
    group by "顾客序号" 
    order by "销售额" desc, "顾客序号")
UNION
  (select "顾客序号", sum("件数"*"价格") as "销售额", cast('食品' as varchar(5)) as "销售种类" 
    from "食品销售数据"."交易数据_2020_2023" as a
  left join "食品销售数据"."食品产品数据" as b 
    on a."产品编号" = b."产品编号" 
    group by "顾客序号" 
    order by "销售额" desc, "顾客序号")),
customer_ranking as (
  select *, row_number() over (partition by "销售种类" order by "销售额" desc) as ranking, count(*) over (partition by "销售种类") as cnt -- (select count(*) as cnt from clothes_sale_by_customer) as cnt 
  from sale_by_customer)
select c."顾客序号","顾客姓名","生日","注册日期","省份","城市","地区","地址", "销售种类" as "vip销售种类"
  from customer_ranking as c
  left join "顾客数据"."顾客数据_2020_2023" as d
    on c."顾客序号" = d."顾客序号"
  where cast(ranking as decimal)/cnt <= 0.1
;

The result of the query statement is as follows, the customer's name, birthday, address, and registration date are all fictitious data.

These users are high-quality customers in clothing and food sectors. In order to increase the repurchase rate, we can consider issuing preferential policies to these users to encourage them to repurchase.

query evaluation

Click "Query History" on the left menu bar to query the request history of previous SQL tasks according to the status and execution date.

"Query History" provides SQL task execution information, status, time and other information. The trial version will reserve the download function of historical request result sets for users, but the size of the result sets available for download cannot exceed 100MB.

Clicking on the SQL text provides detailed information on the SQL task.

As shown in the figure, the upper side of the SQL text is the task details, and the two buttons in the upper right corner provide the functions of opening the text and copying the text in "Data Insight" respectively.

epilogue

So far, we have completed the demonstration of data instance on PieCloudDB. You are welcome to log in to Tuoshupai's official website to try PieCloudDB "Cloud on Cloud" version for free, and start your own data exploration journey.

 

Guess you like

Origin blog.csdn.net/OpenPie/article/details/130220451