AWS big data combat Lab5-data warehouse and visualization (6)

In this exercise, you will learn how to use Redshift and Amazon QuickSight platforms to build data visualization applications. You will see how to use Amazon's data warehouse to load data from the data lake and display it with fully managed data visualization tools.

The objectives of this experiment include:

  • 1. Create a Redshift cluster
  • 2. Load the S3 data files into the Redshift database in batches
  • 3. Use Quicksight to visualize the data table

The architecture diagram of this experiment is as follows

image-20210320135740643

Build a data warehouse

1. View the data

Check s3://lab-921283538843-wzlinux-com/spark/outputwhether the parquet format file generated in the EMR experiment in the S3 bucket (here ) exists.

image-20210320135932095

2. Create IAM Role

Select IAM service, click Role -> Create Role, select Redshift

image-20210320140034950

Select Redshift-Customizable, click Next permission

image-20210320140109645

Select permission AmazonS3ReadOnlyAccess

image-20210320140136981

Add the permission name myRedshiftRole, click confirm

image-20210320140206805

3. Create a subnet group

Before creating a Redshift cluster, create a subnet group. Select Redshift service, select "CONFIG" -> "Manage Subnet Group" in the left menu bar

image-20210320140333267

Then select "Create a cluster subnet group", the subnet group name can accept the default name "cluster-subnet-group-1", and enter any descriptive text in the description box. Select "Default VPC", select "Add all subnets for this VPC", and then click "Create cluster subnet group" to complete the creation of the subnet group.

image-20210320140432628

4. Create a Redshift cluster

Select "Cluster" in the left menu, click "Create Cluster", set the name of the cluster (do not use Chinese, do not use special characters, start with English, can have numbers, and can have minus signs), and select node type dc2.large

image-20210320140610586

Database configuration accept the default value, enter the master user password (remember the password you entered)

image-20210320140856650

In the cluster permissions, select the myRedshiftRole role created earlier, and click "Associate IAM role"

image-20210320194526535

In other configurations, select the default VPC, the default security group and the cluster subnet group created earlier, and click to confirm "Create Cluster". After about 5 minutes, the cluster becomes "Available".

5. Access the Redshift database

There are two ways to access the Redshift database, one is through the query editor on the Redshift Console, and the other is through a SQL client (such as SQL Workbench/J client).

In this experiment, for easy operation, use the query editor on the Redshift Console to access the database. Select "Editor" in the left menu, enter the parameters in the "Connect to Database" window, and then "Connect to Database"

image-20210320141404147

6. Create a table

Create a table in the query editor, select "Public" in the Select Schema on the left, and then enter the SQL statement to create the table in the SQL query window:

create table table1(
    tno varchar(20),
    tdate varchar(15),
    uno varchar(10),
    pno varchar(10),
    tnum int,
    uname varchar(20),
    umobile varchar(20),
    ano varchar(20),
    acity varchar(50),
    aname varchar(50),
    pclass varchar(10),
    pname varchar(50),
    price decimal(10, 2)
);

As shown below

image-20210320141517613

Select "Run" and the result should show "Completed"

7. Import S3 data


Open a new SQL query window (here is Query 2), enter the SQL command below to load S3 data, pay attention to replace the account with the actual account ID, and confirm the S3 bucket address you have obtained.

copy table1 from 's3://lab-921283538843-wzlinux-com/spark/output/' 
credentials 'aws_iam_role=arn:aws:iam::921283538843:role/myRedshiftRole' 
format as parquet; 

As shown below

image-20210320195005679

Click Run, the result should be displayed as "Completed". Enter "select from table1;" in Query3 to query the data in the table. Enter "select count( ) from table1;" in Query4 , the data in the table should be queried. This shows that the data in S3 has been copied to the Redshift data warehouse.

image-20210320202703451

8. Allow Internet access

In the next step, we will use AWS Quicksight to visualize the data in Redshift. Before that, Quicksight needs to be given access to Redshift from the Internet. For this, we first create a public network elastic IP address in the EC2 menu (the process is omitted). Then modify the Redshift properties to grant public access.

image-20210320202959325

Change publicly accessible to "Yes" and select the corresponding elastic public network IP address.

image-20210320203019533

This operation takes a while, just wait a few minutes.

data visualization

1. Enable Quicksight

About enabling Quicksight, I will not introduce it here, you can watch Lab3.

2. Create a data set

Enter the Quicksight console interface, click on the data set on the left, and choose to create a "new data set"

image-20210320203332064

Select the Redshift (automatic discovery) data set, Redshift also has a manual connection method, but we will not demonstrate here

image-20210320203413509

Enter the connection parameters, select "Create data source", select the corresponding Redshift database, pay attention to configure the corresponding address, port, database name, user name and password

image-20210320203516186

Select Table1, click "Select", and finally click "Virtualize" to complete the creation of the data set (here we choose to import the data from Redshift to Quicksigh, so the analysis speed will be much faster)

image-20210320203709754

3. Data visualization

Open the visualization object window and select the display mode as "vertical bar graph"

image-20210320203815481

The tdatedrag and drop X axisbar, the tnumdrag valuebar (the system automatically selects the count)

image-20210320204047396

This completes the display of "using the date as the X axis and the total sales volume of the day as the Y axis from high to low" display.

Welcome everyone to scan the QR code to pay attention and get more information

AWS big data combat Lab5-data warehouse and visualization (6)

Guess you like

Origin blog.51cto.com/wzlinux/2678128