Based on the innovative HSAP architecture, Hologres can unify the OLAP system (Greenplum, Presto, Impala, ClickHouse) and KV database/Serving system (HBase, Redis) in your original data warehouse architecture into one big data computing engine, and provide fast offline real-time integrated analysis capabilities.
- Hologres 5000CU, 20GB storage free trial, go to trial>>
Product core advantages:
1. Simplify the data warehouse structure, reduce data handling and multiple maintenance costs
2. Strong real-time query performance, refreshing the TPC-H 30000GB world record
3. Integrating lake warehouse query, 0 ETL to import offline MaxCompute data
Introduction to Hologres Tutorial
Based on the TPC-H dataset data in MaxCompute and GitHub public event data, create Hologres database, external tables, internal tables, import data into internal tables, and use Hologres to query internal tables and external tables respectively on Hologres, Alibaba Cloud real-time data warehouse Guidance on the data in . Hologres has the advantage of extremely fast response in querying data.
Prepare environment and resources
Before starting the tutorial, prepare the environment and resources as follows:
-
A VPC and a VPC have been created. For details, see Creating a VPC and a VPC .
-
Visit Alibaba Cloud for a free trial . Click the login/register button at the top right of the page, and follow the prompts to complete account login (already have an Alibaba Cloud account), account registration (if there is no Alibaba Cloud account) or real-name authentication (according to the requirements of trial products, complete personal real-name authentication or enterprise real-name authentication authentication).
-
After successfully logging in, select Big Data Computing > Data Computing and Analysis under the product category , and click Try Now on the real-time data warehouse Hologres card .
-
Complete parameter information configuration on the panel that pops up to try real-time data warehouse Hologres products. This trial tutorial takes the parameter information in the table as an example, and keeps the default value for parameters not mentioned.
parameter | example value |
---|---|
region | East China 1 (Hangzhou) |
instance type | universal type |
computing resources | 8-core 32GB (number of computing nodes: 1) |
proprietary network | Select the VPC created in step 1. |
Proprietary network switch | Select the switch created in step 1. |
instance name | hologres_test |
resource group | default resource group |
- After selecting the service agreement, click Try Now , and follow the prompts on the page to complete the trial application.
Click Go to console to start the trial experience.
create database
Quickly create a database through Hologres for subsequent storage of sample data for query use.
-
Log in to the Hologres management console , and click the instance list on the left .
-
On the instance list page, click the corresponding instance name.
-
On the left navigation bar of the instance details page , click Database Management .
-
On the DB authorization page, click Add a database in the upper right corner .
-
In the New Database dialog box, configure the following parameters.
title | illustrate |
---|---|
instance name | Choose which Hologres instance to create the database on. By default, the name of the currently logged-in instance is displayed, and you can also select other Hologres instances in the drop-down box. |
Name database | This example database name is set to holo_tutorial . |
Simple Permissions Policy | You can choose a permission policy for the database you create. For more descriptions of permission policies, please refer to: - SPM : Simple permission model, the authorization of the permission model is based on DB as the granularity, divided into admin (administrator), developer (developer), writer (reader and writer) and viewer ( Analyst) four roles, you can perform convenient and safe permission management on objects in the DB through a small number of permission management functions. - SLPM : A simple permission model based on the Schema level. The permission model takes Schema as the granularity and divides <db>.admin (DB administrator), <db>.<schema>.developer (developer), <db>.< schema>.writer (reader and writer) and <db>.<schema>.viewer (analyst), which are more fine-grained than the simple permission model. -Expert : Hologres is compatible with PostgreSQL and uses the same permission system as Postgres. |
- Click OK .
create table
After the database is successfully created, you need to create the corresponding tables in the database.
- Login to the database.
<!---->
-
-
On the top menu bar of the DB authorization page, click Metadata Management .
-
On the metadata management page, double-click the successfully created database name in the directory tree on the left, and click OK .
-
- Create a new external table.
-
-
On the SQL editor page, click the icon in the upper left corner.
-
Added an external table that uses TPC-H data set data. TPC-H data is referenced from TPC. For more information, see TPC .
On the newly added temporary Query query page, after selecting the created instance name and database , please enter the sample code in the SQL query edit box, and click Run .
The sample SQL statement is used to create an external table mapped to tables such as odps_customer_10g and odps_lineitem_10g in the MaxCompute public space MAXCOMPUTE_PUBLIC_DATA for subsequent queries.
-
DROP FOREIGN TABLE IF EXISTS odps_customer_10g;
DROP FOREIGN TABLE IF EXISTS odps_lineitem_10g;
DROP FOREIGN TABLE IF EXISTS odps_nation_10g;
DROP FOREIGN TABLE IF EXISTS odps_orders_10g;
DROP FOREIGN TABLE IF EXISTS odps_part_10g;
DROP FOREIGN TABLE IF EXISTS odps_partsupp_10g;
DROP FOREIGN TABLE IF EXISTS odps_region_10g;
DROP FOREIGN TABLE IF EXISTS odps_supplier_10g;
IMPORT FOREIGN SCHEMA "MAXCOMPUTE_PUBLIC_DATA#default" LIMIT to
(
odps_customer_10g,
odps_lineitem_10g,
odps_nation_10g,
odps_orders_10g,
odps_part_10g,
odps_partsupp_10g,
odps_region_10g,
odps_supplier_10g
)
FROM SERVER odps_server INTO public OPTIONS(if_table_exist 'error',if_unsupported_type 'error');
-
- Added an external table that uses GitHub's public event data. The data is referenced from GitHub. For more information, see Offline and Real-time Integration Practice Based on GitHub's Public Event Dataset .
Click the icon in the upper left corner , and on the newly added temporary Query query page, after selecting the created instance name and database , please enter the sample code in the SQL query edit box, and click Run . The sample SQL statement is used to create an external table named under Schema
in the MaxCompute public space MAXCOMPUTE_PUBLIC_DATA for subsequent queries.github_events
dwd_github_events_odps
- Added an external table that uses GitHub's public event data. The data is referenced from GitHub. For more information, see Offline and Real-time Integration Practice Based on GitHub's Public Event Dataset .
DROP FOREIGN TABLE IF EXISTS dwd_github_events_odps;
IMPORT FOREIGN SCHEMA "MAXCOMPUTE_PUBLIC_DATA#github_events" LIMIT to
(
dwd_github_events_odps
)
FROM SERVER odps_server INTO public OPTIONS(if_table_exist 'error',if_unsupported_type 'error');
- Create a new internal table.
-
-
On the SQL editor page, click the icon in the upper left corner.
-
Create a new internal table using TPC-H dataset data.
On the newly added temporary Query query page, after selecting the created instance name and database , please enter the following statement in the SQL query edit box, and click Run .
The sample SQL statement is used to create tables named LINEITEM, ORDERS, PARTSUPP, PART, CUSTOMER, SUPPLIER, NATION, and REGION for subsequent data storage.
-
DROP TABLE IF EXISTS LINEITEM;
BEGIN;
CREATE TABLE LINEITEM
(
L_ORDERKEY BIGINT NOT NULL,
L_PARTKEY INT NOT NULL,
L_SUPPKEY INT NOT NULL,
L_LINENUMBER INT NOT NULL,
L_QUANTITY DECIMAL(15,2) NOT NULL,
L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
L_DISCOUNT DECIMAL(15,2) NOT NULL,
L_TAX DECIMAL(15,2) NOT NULL,
L_RETURNFLAG TEXT NOT NULL,
L_LINESTATUS TEXT NOT NULL,
L_SHIPDATE TIMESTAMPTZ NOT NULL,
L_COMMITDATE TIMESTAMPTZ NOT NULL,
L_RECEIPTDATE TIMESTAMPTZ NOT NULL,
L_SHIPINSTRUCT TEXT NOT NULL,
L_SHIPMODE TEXT NOT NULL,
L_COMMENT TEXT NOT NULL,
PRIMARY KEY (L_ORDERKEY,L_LINENUMBER)
);
CALL set_table_property('LINEITEM', 'clustering_key', 'L_SHIPDATE,L_ORDERKEY');
CALL set_table_property('LINEITEM', 'segment_key', 'L_SHIPDATE');
CALL set_table_property('LINEITEM', 'distribution_key', 'L_ORDERKEY');
CALL set_table_property('LINEITEM', 'bitmap_columns', 'L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
CALL set_table_property('LINEITEM', 'dictionary_encoding_columns', 'L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
CALL set_table_property('LINEITEM', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS ORDERS;
BEGIN;
CREATE TABLE ORDERS
(
O_ORDERKEY BIGINT NOT NULL PRIMARY KEY,
O_CUSTKEY INT NOT NULL,
O_ORDERSTATUS TEXT NOT NULL,
O_TOTALPRICE DECIMAL(15,2) NOT NULL,
O_ORDERDATE timestamptz NOT NULL,
O_ORDERPRIORITY TEXT NOT NULL,
O_CLERK TEXT NOT NULL,
O_SHIPPRIORITY INT NOT NULL,
O_COMMENT TEXT NOT NULL
);
CALL set_table_property('ORDERS', 'segment_key', 'O_ORDERDATE');
CALL set_table_property('ORDERS', 'distribution_key', 'O_ORDERKEY');
CALL set_table_property('ORDERS', 'bitmap_columns', 'O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT');
CALL set_table_property('ORDERS', 'dictionary_encoding_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_COMMENT');
CALL set_table_property('ORDERS', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS PARTSUPP;
BEGIN;
CREATE TABLE PARTSUPP
(
PS_PARTKEY INT NOT NULL,
PS_SUPPKEY INT NOT NULL,
PS_AVAILQTY INT NOT NULL,
PS_SUPPLYCOST DECIMAL(15,2) NOT NULL,
PS_COMMENT TEXT NOT NULL,
PRIMARY KEY(PS_PARTKEY,PS_SUPPKEY)
);
CALL set_table_property('PARTSUPP', 'distribution_key', 'PS_PARTKEY');
CALL set_table_property('PARTSUPP', 'colocate_with', 'LINEITEM');
CALL set_table_property('PARTSUPP', 'bitmap_columns', 'PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY,PS_COMMENT');
CALL set_table_property('PARTSUPP', 'dictionary_encoding_columns', 'PS_COMMENT');
CALL set_table_property('PARTSUPP', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS PART;
BEGIN;
CREATE TABLE PART
(
P_PARTKEY INT NOT NULL PRIMARY KEY,
P_NAME TEXT NOT NULL,
P_MFGR TEXT NOT NULL,
P_BRAND TEXT NOT NULL,
P_TYPE TEXT NOT NULL,
P_SIZE INT NOT NULL,
P_CONTAINER TEXT NOT NULL,
P_RETAILPRICE DECIMAL(15,2) NOT NULL,
P_COMMENT TEXT NOT NULL
);
CALL set_table_property('PART', 'distribution_key', 'P_PARTKEY');
CALL set_table_property('PART', 'bitmap_columns', 'P_PARTKEY,P_SIZE,P_NAME,P_MFGR,P_BRAND,P_TYPE,P_CONTAINER,P_COMMENT');
CALL set_table_property('PART', 'dictionary_encoding_columns', 'P_NAME,P_MFGR,P_BRAND,P_TYPE,P_CONTAINER,P_COMMENT');
CALL set_table_property('PART', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS CUSTOMER;
BEGIN;
CREATE TABLE CUSTOMER
(
C_CUSTKEY INT NOT NULL PRIMARY KEY,
C_NAME TEXT NOT NULL,
C_ADDRESS TEXT NOT NULL,
C_NATIONKEY INT NOT NULL,
C_PHONE TEXT NOT NULL,
C_ACCTBAL DECIMAL(15,2) NOT NULL,
C_MKTSEGMENT TEXT NOT NULL,
C_COMMENT TEXT NOT NULL
);
CALL set_table_property('CUSTOMER', 'distribution_key', 'C_CUSTKEY');
CALL set_table_property('CUSTOMER', 'bitmap_columns', 'C_CUSTKEY,C_NATIONKEY,C_NAME,C_ADDRESS,C_PHONE,C_MKTSEGMENT,C_COMMENT');
CALL set_table_property('CUSTOMER', 'dictionary_encoding_columns', 'C_NAME,C_ADDRESS,C_PHONE,C_MKTSEGMENT,C_COMMENT');
CALL set_table_property('CUSTOMER', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS SUPPLIER;
BEGIN;
CREATE TABLE SUPPLIER
(
S_SUPPKEY INT NOT NULL PRIMARY KEY,
S_NAME TEXT NOT NULL,
S_ADDRESS TEXT NOT NULL,
S_NATIONKEY INT NOT NULL,
S_PHONE TEXT NOT NULL,
S_ACCTBAL DECIMAL(15,2) NOT NULL,
S_COMMENT TEXT NOT NULL
);
CALL set_table_property('SUPPLIER', 'distribution_key', 'S_SUPPKEY');
CALL set_table_property('SUPPLIER', 'bitmap_columns', 'S_SUPPKEY,S_NAME,S_ADDRESS,S_NATIONKEY,S_PHONE,S_COMMENT');
CALL set_table_property('SUPPLIER', 'dictionary_encoding_columns', 'S_NAME,S_ADDRESS,S_PHONE,S_COMMENT');
CALL set_table_property('SUPPLIER', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS NATION;
BEGIN;
CREATE TABLE NATION(
N_NATIONKEY INT NOT NULL PRIMARY KEY,
N_NAME text NOT NULL,
N_REGIONKEY INT NOT NULL,
N_COMMENT text NOT NULL
);
CALL set_table_property('NATION', 'distribution_key', 'N_NATIONKEY');
CALL set_table_property('NATION', 'bitmap_columns', 'N_NATIONKEY,N_NAME,N_REGIONKEY,N_COMMENT');
CALL set_table_property('NATION', 'dictionary_encoding_columns', 'N_NAME,N_COMMENT');
CALL set_table_property('NATION', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS REGION;
BEGIN;
CREATE TABLE REGION
(
R_REGIONKEY INT NOT NULL PRIMARY KEY,
R_NAME TEXT NOT NULL,
R_COMMENT TEXT
);
CALL set_table_property('REGION', 'distribution_key', 'R_REGIONKEY');
CALL set_table_property('REGION', 'bitmap_columns', 'R_REGIONKEY,R_NAME,R_COMMENT');
CALL set_table_property('REGION', 'dictionary_encoding_columns', 'R_NAME,R_COMMENT');
CALL set_table_property('REGION', 'time_to_live_in_seconds', '31536000');
COMMIT;
-
- Added an internal table that exposes event data using GitHub.
Click the icon in the upper left corner , and on the newly added temporary Query query page, after selecting the created instance name and database , please enter the sample code in the SQL query edit box, and click Run .
The sample SQL statement is used to create an internal table named gh_event_data, and set the table attributes of distribution_key, event_time_column, and clustering_key for subsequent data import and high-performance query.
- Added an internal table that exposes event data using GitHub.
DROP TABLE IF EXISTS gh_event_data;
BEGIN;
CREATE TABLE gh_event_data (
id bigint,
actor_id bigint,
actor_login text,
repo_id bigint,
repo_name text,
org_id bigint,
org_login text,
type text,
created_at timestamp with time zone NOT NULL,
action text,
iss_or_pr_id bigint,
number bigint,
comment_id bigint,
commit_id text,
member_id bigint,
rev_or_push_or_rel_id bigint,
ref text,
ref_type text,
state text,
author_association text,
language text,
merged boolean,
merged_at timestamp with time zone,
additions bigint,
deletions bigint,
changed_files bigint,
push_size bigint,
push_distinct_size bigint,
hr text,
month text,
year text,
ds text
);
CALL set_table_property('public.gh_event_data', 'distribution_key', 'id');
CALL set_table_property('public.gh_event_data', 'event_time_column', 'created_at');
CALL set_table_property('public.gh_event_data', 'clustering_key', 'created_at');
COMMIT;
Import sample data
After the internal table is successfully created, the data can be imported into the Hologres internal table through the following steps. External tables do not store data in Hologres, only field mapping. Through the external table, you can use Hologres to directly call the data stored in the MaxCompute public space MAXCOMPUTE_PUBLIC_DATA.
-
On the SQL editor page, click the icon in the upper left corner.
-
Import TPC-H dataset data.
On the newly added temporary Query query page, after selecting the created instance name and database , please enter the sample code in the SQL query edit box, and click Run .
The sample SQL statement imports data from tables such as public.odps_customer_10g and public.odps_lineitem_10g in the MaxCompute public space MAXCOMPUTE_PUBLIC_DATA to internal tables with corresponding names for subsequent queries.
INSERT INTO public.customer SELECT * FROM public.odps_customer_10g ;
INSERT INTO public.lineitem SELECT * FROM public.odps_lineitem_10g ;
INSERT INTO public.nation SELECT * FROM public.odps_nation_10g ;
INSERT INTO public.orders SELECT * FROM public.odps_orders_10g ;
INSERT INTO public.part SELECT * FROM public.odps_part_10g ;
INSERT INTO public.partsupp SELECT * FROM public.odps_partsupp_10g ;
INSERT INTO public.region SELECT * FROM public.odps_region_10g ;
INSERT INTO public.supplier SELECT * FROM public.odps_supplier_10g ;
vacuum nation;
vacuum region;
vacuum supplier;
vacuum customer;
vacuum part;
vacuum partsupp;
vacuum orders;
vacuum lineitem;
analyze nation;
analyze region;
analyze lineitem;
analyze orders;
analyze customer;
analyze part;
analyze partsupp;
analyze supplier;
analyze lineitem (l_orderkey,l_partkey,l_suppkey);
analyze orders (o_custkey);
analyze partsupp(ps_partkey,ps_suppkey);
- Import GitHub public event data.
Click the icon in the upper left corner , and on the newly added temporary Query query page, after selecting the created instance name and database , please enter the sample code in the SQL query edit box, and click Run .
The sample SQL statement imports the previous day's data from the table dwd_github_events_odps in the MaxCompute public space MAXCOMPUTE_PUBLIC_DATA to the internal table for subsequent queries. Due to the limited resources of Hologres in this event, it is recommended that you import and query data less than 15 days old.
INSERT INTO gh_event_data
SELECT
*
FROM
dwd_github_events_odps
WHERE
ds >= (CURRENT_DATE - interval '1 day')::text;
analyze gh_event_data;
Query the data in the table
-
On the SQL editor page, click the icon in the upper left corner.
-
Data query based on TPC-H dataset.
On the newly added temporary Query query page, after selecting the created instance name and database , please enter the sample code in the SQL query edit box, and click Run .
The following SQL codes are used to query the internal table data. If you need to query the external table, please replace the table name queried by the corresponding code with the external table name.
For the 22 query statements based on the evolution of TPC-H, please refer to the data in the query table .
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval '120' day
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
- Query based on GitHub public event data. Click the icon in the upper left corner , and on the newly added temporary Query query page, after selecting the created instance name and database , please enter the sample code in the SQL query edit box, and click Run . This article gives some simple data analysis statements, you can design other analysis statements and query based on the fields in the table. The following SQL codes are used to query the internal table data. If you need to query the external table, please replace the table name queried by the corresponding code with the external table name.
-
- Query yesterday's most active projects.
SELECT
repo_name,
COUNT(*) AS events
FROM
gh_event_data
WHERE
created_at >= CURRENT_DATE - interval '1 day'
GROUP BY
repo_name
ORDER BY
events DESC
LIMIT 5;
-
- Query the most active developers yesterday.
SELECT
actor_login,
COUNT(*) AS events
FROM
gh_event_data
WHERE
created_at >= CURRENT_DATE - interval '1 day'
AND actor_login NOT LIKE '%[bot]'
GROUP BY
actor_login
ORDER BY
events DESC
LIMIT 5;
-
- Query yesterday's programming language ranking.
SELECT
language,
count(*) total
FROM
gh_event_data
WHERE
created_at > CURRENT_DATE - interval '1 day'
AND language IS NOT NULL
GROUP BY
language
ORDER BY
total DESC
LIMIT 10;
-
- Query the ranking of the number of new stars added to the project yesterday (the scene of canceling the star is not considered).
SELECT
repo_id,
repo_name,
COUNT(actor_login) total
FROM
gh_event_data
WHERE
type = 'WatchEvent'
AND created_at > CURRENT_DATE - interval '1 day'
GROUP BY
repo_id,
repo_name
ORDER BY
total DESC
LIMIT 10;
Finish
After completing the above operations, you have successfully completed the Hologres data query operation. After the query command is executed successfully, the Result tab pops up under the temporary Query page , displaying the following query data results.
-
Example of data query results based on the TPC-H dataset:
-
Example of query results based on GitHub public event data:
-
-
Yesterday's most active project:
-
Most Active Developers Yesterday:
-
Yesterday's programming language ranking:
-
The number of new stars added to the project yesterday:
-
-
Hologres 5000CU, 20GB storage free trial, go to trial>>
-
了解Hologres: https://www.aliyun.com/product/bigdata/hologram
Welcome to the Hologres developer community