How the data warehouse Redshift migration from AWS to Ali cloud AnalyticDB for PostgreSQL

Ali cloud AnalyticDB for PostgreSQL (hereinafter referred to as ADB PG, is the original HybridDB for PostgreSQL) as the core PostgreSQL-based MPP architecture, real-time data warehousing services can support complex ETL tasks, and also supports high-performance online query, closely integrated with Ali cloud ecosystem. Redshift is based on the same AWS's MPP data warehouse server PostgreSQL core engine, AWS has been widely in use as a data warehouse. ADB PG and Redshift highly compatible from the architecture to the grammar with Redshift. This article focuses on how two numbers warehouse platform migration.

Compare Product Architecture

Ali cloud AnalyticDB for PostgreSQL based on PostgreSQL 9.4 the latest version 6.0 build, Redshift based on PostgreSQL 8.2 version, relatively ADB PG functions to be more comprehensive, and fully compatible with PostgreSQL ecological tools, including extended analysis PostGIS, MADlib and so on. Redshift only support columns stored table without support line memory table PostgreSQL native, and ADB PG retains the PostgreSQL line memory table supports, data updating operation high throughput, also supported for OLAP large table aggregate columns kept operating table.

Compare AnalyticDB for PG and the Redshift

Function Item ADB PG Redshift
PostgreSQL version PG 9.4 PG 8.2
SQL syntax Compatible PG, part of the Oracle compatible syntax PG compatible
Affairs stand by stand by
Line storage stand by not support
Column stores stand by stand by
Partition Table stand by stand by
Cloud Storage Support OSS data online access Support S3 online data access
Multi-mode analysis PostGIS / MADLib / retrieving vector  

Comparison of key grammar and migration

Ali cloud AnalyticDB for PostgreSQL and PostgreSQL AWS Redshift is based on a stand-alone core engine, so the syntax is highly compatible, and some slightly different syntax description below.

DDL differences to build grammar table

grammar Redshift ADB PG
Hash tables Distribution DISTKEY(col) DISTRIBUTED BY(col)
Table random distribution DISTSTYLE EVEN DISTRIBUTED RANDOMLY
Table copy distribution DISTSTYLE ALL DISTRIBUTED REPLICATED
Data compression coding AZ64/BYTEDICT/DELTA/LZO/RAW/RUNLENGTH/ZSTD (COMPRESSTYPE={ZStD/ZLIB/QUICKLZ/RLE_TYPE/NONE})
Column sort key deposit SORTKEY (col) with(APPENDONLY=true,ORIENTATION=column)sortkey (volume)
System functions PG8.2 and some custom function PG9.4 and some custom function

Grammar guide

ADB PG built form guide
Redshift built form guide

DDL conversion example 1

Redshift construction of the table statement contains key DISTKEY distribution and sorting columns:

CREATE TABLE schema1.table1(
    filed1 VARCHAR(100) ENCODE lzo,
    filed2 INTEGER DISTKEY,
    filed3 INTEGER,
    filed4 BIGINT ENCODE lzo,
    filed5 INTEGER,)
INTERLEAVED SORTKEY (
    filed1,
    filed2);

ADB PG build table statement:

CREATE TABLE schema1.table1
(
    filed1 VARCHAR(100) ,
    filed3 INTEGER,
    filed5 INTEGER
)
WITH(APPENDONLY=true,ORIENTATION=column,COMPRESSTYPE=zlib)
DISTRIBUTED BY (filed2)
SORTKEY
(
    filed1,
    filed2
)
            

DDL conversion example 2

Redshift construction of the table statement contains ENCODE and SORTKEY options:

CREATE TABLE schema2.table2
(
    filed1 VARCHAR(50) ENCODE lzo,
    filed2 VARCHAR(50) ENCODE lzo,
    filed3 VARCHAR(20) ENCODE lzo,
)
DISTSTYLE EVEN
INTERLEAVED SORTKEY
(
    filed1
);

ADB PG build table statement:

CREATE TABLE schema2.table2(
    filed1 VARCHAR(50),
    filed2 VARCHAR(50),
    filed3 VARCHAR(20))
WITH(APPENDONLY=true, ORIENTATION=column, COMPRESSTYPE=zlib)
DISTRIBUTED randomly
SORTKEY
(
    filed1
);        

data migration

Redshift and ADB PG support parallel import and export data from cloud storage to tell. Redshift to migrate data from AnalyticDB for PostgreSQL comprising the steps of:

  1. Resources and environment to prepare, prepare in advance before proceeding Amazon Redshift, Amazon S3 (Amazon Simple Storage Service), AnalyticDB for PostgreSQL and Ali cloud object storage service (OSS) related resources.
  2. Redshift import data into the Amazon S3.
  3. OSSImport the data file using Amazon S3 in CSV format into OSS.
  4. AnalyticDB for PostgreSQL created in the target and source Redshift corresponding object, including the mode (the Schema), the table (the Table), the view (View) and functions (Function).
  5. OSS used to import data into external tables AnalyticDB for PostgreSQL.

Overall migration path is as follows:


原文链接
本文为阿里云原创内容,未经允许不得转载。

发布了2298 篇原创文章 · 获赞 1862 · 访问量 104万+

Guess you like

Origin blog.csdn.net/yunqiinsight/article/details/103970384