AWS redshift->hdb pg(Greenplum),DDL语法转换规则

背景
redshift 和 Greenplum都是源自PostgreSQL的MPP数据库:

https://wiki.postgresql.org/wiki/PostgreSQL_derived_databases

语法上有一些差异,从redshift切换到greenplum(或阿里云hdb pg)时,DDL需要做一定的转换。

redshift create table
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html

CREATE [ [LOCAL ] { TEMPORARY | TEMP } ] TABLE
[ IF NOT EXISTS ] table_name
( { column_name data_type [column_attributes] [ column_constraints ]
| table_constraints
| LIKE parent_table [ { INCLUDING | EXCLUDING } DEFAULTS ] }
[, ... ] )
[ BACKUP { YES | NO } ] -- gpdb 不支持,(指定是否自动快照备份)
[table_attribute]
where column_attributes are:

[ DEFAULT default_expr ]
[ IDENTITY ( seed, step ) ] -- gpdb 不支持该语法,使用SEQUENCE+DEFAULT代替
[ ENCODE encoding ] -- 指定列压缩算法,gpdb 不支持该语法,使用全局压缩算法代替
[ DISTKEY ] -- gpdb 不支持该语法,但是redshift只支持一列作为分布键,GPDB支持多列作为分布键,gpdb使用distributed by(colname1, ...)代替
[ SORTKEY ] -- gpdb 不支持,阿里云hdb pg支持,语法参考后面的转换规则。
and column_constraints are:

[ { NOT NULL | NULL } ]
[ { UNIQUE | PRIMARY KEY } ]
[ REFERENCES reftable [ ( refcolumn ) ] ]
and table_constraints are:

[ UNIQUE ( column_name [, ... ] ) ]
[ PRIMARY KEY ( column_name [, ... ] ) ]
[ FOREIGN KEY (column_name [, ... ] ) REFERENCES reftable [ ( refcolumn ) ]
and table_attributes are:

[ DISTSTYLE { EVEN | KEY | ALL } ] -- gpdb 不支持该语法,even使用distributed randomly代替,key使用distributed by (colname1,...) 代替,ALL不支持(维度表,所有SEGMENT都有一份)。
[ DISTKEY ( column_name ) ] -- gpdb 不支持该语法,但是redshift只支持一列作为分布键,GPDB支持多列作为分布键,gpdb使用distributed by(colname1, ...)代替
[ [COMPOUND | INTERLEAVED ] SORTKEY ( column_name [, ...] ) ] -- gpdb 不支持,阿里云HDB PG支持sortkey,语法参考后面的转换规则
greenplum(阿里云hdb pg) create table
https://gpdb.docs.pivotal.io/43300/ref_guide/sql_commands/CREATE_TABLE.html

Synopsis

CREATE [[GLOBAL | LOCAL] {TEMPORARY | TEMP}] TABLE table_name (
[ { column_name data_type [ DEFAULT default_expr ]
[column_constraint [ ... ]
[ ENCODING ( storage_directive [,...] ) ]
]
| table_constraint
| LIKE other_table [{INCLUDING | EXCLUDING}

                  {DEFAULTS | CONSTRAINTS}] ...}  

[, ... ] ]
)
[ INHERITS ( parent_table [, ... ] ) ]
[ WITH ( storage_parameter=value [, ... ] )
[ ON COMMIT {PRESERVE ROWS | DELETE ROWS | DROP} ]
[ TABLESPACE tablespace ]
[ DISTRIBUTED BY (column, [ ... ] ) | DISTRIBUTED RANDOMLY ]
[ PARTITION BY partition_type (column)

   [ SUBPARTITION BY partition_type (column) ]   
      [ SUBPARTITION TEMPLATE ( template_spec ) ]  
   [...]  
( partition_spec )   
    | [ SUBPARTITION BY partition_type (column) ]  
      [...]  
( partition_spec  
  [ ( subpartition_spec  
       [(...)]   
     ) ]   
)  

where column_constraint is:

[CONSTRAINT constraint_name]
NOT NULL | NULL
| UNIQUE [USING INDEX TABLESPACE tablespace]

        [WITH ( FILLFACTOR = value )]  

| PRIMARY KEY [USING INDEX TABLESPACE tablespace]

             [WITH ( FILLFACTOR = value )]  

| CHECK ( expression )
| REFERENCES table_name [ ( column_name [, ... ] ) ]

        [ key_match_type ]  
        [ key_action ]  

where storage_directive for a column is:

COMPRESSTYPE={ZLIB | QUICKLZ | RLE_TYPE | NONE}

[COMPRESSLEVEL={0-9} ]  
[BLOCKSIZE={8192-2097152} ]  

where storage_parameter for the table is:

APPENDONLY={TRUE|FALSE}
BLOCKSIZE={8192-2097152}
ORIENTATION={COLUMN|ROW}
CHECKSUM={TRUE|FALSE}
COMPRESSTYPE={ZLIB|QUICKLZ|RLE_TYPE|NONE}
COMPRESSLEVEL={0-9}
FILLFACTOR={10-100}
OIDS[=TRUE|FALSE]
and table_constraint is:

[CONSTRAINT constraint_name]
UNIQUE ( column_name [, ... ] )

      [USING INDEX TABLESPACE tablespace]   
      [WITH ( FILLFACTOR=value )]   

| PRIMARY KEY ( column_name [, ... ] )

             [USING INDEX TABLESPACE tablespace]   
             [WITH ( FILLFACTOR=value )]   

| CHECK ( expression )
| FOREIGN KEY ( column_name [, ... ] )

        REFERENCES table_name [ ( column_name [, ... ] ) ]  
        [ key_match_type ]  
        [ key_action ]  
        [ key_checking_mode ]  

where key_match_type is:

MATCH FULL  

| SIMPLE
where key_action is:

ON DELETE   

| ON UPDATE
| NO ACTION
| RESTRICT
| CASCADE
| SET NULL
| SET DEFAULT
where key_checking_mode is:

DEFERRABLE  

| NOT DEFERRABLE
| INITIALLY DEFERRED
| INITIALLY IMMEDIATE
where partition_type is:

LIST  

| RANGE
where partition_specification is:

partition_element [, ...]
and partition_element is:

DEFAULT PARTITION name
| [PARTITION name] VALUES (list_value [,...] )
| [PARTITION name]

 START ([datatype] 'start_value') [INCLUSIVE | EXCLUSIVE]  
 [ END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE] ]  
 [ EVERY ([datatype] [number | INTERVAL] 'interval_value') ]  

| [PARTITION name]

 END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE]  
 [ EVERY ([datatype] [number | INTERVAL] 'interval_value') ]  

[ WITH ( partition_storage_parameter=value [, ... ] ) ]
[ TABLESPACE tablespace ]
where subpartition_spec or template_spec is:

subpartition_element [, ...]
and subpartition_element is:

DEFAULT SUBPARTITION name
| [SUBPARTITION name] VALUES (list_value [,...] )
| [SUBPARTITION name]

 START ([datatype] 'start_value') [INCLUSIVE | EXCLUSIVE]  
 [ END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE] ]  
 [ EVERY ([datatype] [number | INTERVAL] 'interval_value') ]  

| [SUBPARTITION name]

 END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE]  
 [ EVERY ([datatype] [number | INTERVAL] 'interval_value') ]  

[ WITH ( partition_storage_parameter=value [, ... ] ) ]
[ TABLESPACE tablespace ]
where storage_parameter for a partition is:

APPENDONLY={TRUE|FALSE}
BLOCKSIZE={8192-2097152}
ORIENTATION={COLUMN|ROW}
CHECKSUM={TRUE|FALSE}
COMPRESSTYPE={ZLIB|QUICKLZ|RLE_TYPE|NONE}
COMPRESSLEVEL={1-9}
FILLFACTOR={10-100}
OIDS[=TRUE|FALSE]
阿里云 hdb pg sortkey 语法
https://help.aliyun.com/knowledge_detail/59195.html

[ DISTRIBUTED BY (column, [ ... ] ) | DISTRIBUTED RANDOMLY ]
[ SORTKEY (column, [ ... ] )]
例子

create table test(date text, time text, open float, high float, low float, volume int) with(APPENDONLY=true,ORIENTATION=column) sortkey (volume);

ALTER [[GLOBAL | LOCAL] {TEMPORARY | TEMP}] TABLE table_name SET SORTKEY (column, [ ... ] )
转换规则
1、去除完全不支持的语法

[ BACKUP { YES | NO } ] -- gpdb 不支持,(指定是否自动快照备份)
2、转换不支持但兼容的语法

where column_attributes are:

2.1、

[ IDENTITY ( seed, step ) ] -- gpdb 不支持该语法,使用SEQUENCE+DEFAULT代替
去掉并转换为:

create sequence seq1 start with seed increment by step;
create table test (
id int default nextval('seq1')
);
例如

postgres=# create sequence seq1 start with 100 increment by 2;
CREATE SEQUENCE
postgres=# select nextval('seq1');

nextval

 100  

(1 row)

postgres=# select nextval('seq1');

nextval

 102  

(1 row)
2.2、

[ ENCODE encoding ] -- 指定列压缩算法,gpdb 不支持该语法,使用全局压缩算法代替

BYTEDICT
DELTA
DELTA32K
LZO
MOSTLY8
MOSTLY16
MOSTLY32
RAW (no compression)
RUNLENGTH
TEXT255
TEXT32K
ZSTD
去掉并转换为:

create table test (id int)
with (COMPRESSTYPE={ZLIB|QUICKLZ|RLE_TYPE|NONE}) -- 选择一个压缩算法
;
2.3、

[ DISTKEY ] -- gpdb 不支持该语法,但是redshift只支持一列作为分布键,GPDB支持多列作为分布键,gpdb使用distributed by(colname1, ...)代替
去掉并转换为:

create table test (id int, info text)
distributed by (id); -- 分布列名与DISTKEY对应列名一致
2.4、

[ SORTKEY ] -- gpdb 不支持,阿里云hdb pg支持,语法参考后面的转换规则。
去掉并转换为:

create table test(date text, time text, open float, high float, low float, volume int)
with(APPENDONLY=true,ORIENTATION=column)
sortkey (volume); -- 这里指定sortkey

ALTER [[GLOBAL | LOCAL] {TEMPORARY | TEMP}] TABLE table_name
SET SORTKEY (column, [ ... ] ); -- 这里指定sortkey
如果设置了sort key,在数据导入完成后,执行如下SQL,(堵塞DDL,DML,DSL,重新对数据排序)

VACUUM SORT ONLY [tablename]
and table_attributes are:

2.5、

[ DISTSTYLE { EVEN | KEY | ALL } ] -- gpdb 不支持该语法,even使用distributed randomly代替,key使用distributed by (colname1,...) 代替,ALL不支持(维度表,所有SEGMENT都有一份)。
去掉并转换为:

distributed randomly

distributed by (colname1, ...)
2.6、

[ DISTKEY ( column_name ) ] -- gpdb 不支持该语法,但是redshift只支持一列作为分布键,GPDB支持多列作为分布键,gpdb使用distributed by(colname1, ...)代替
去掉并转换为:

distributed by(colname1, ...)
2.7、

[ [COMPOUND | INTERLEAVED ] SORTKEY ( column_name [, ...] ) ] -- gpdb 不支持,阿里云HDB PG支持sortkey,语法参考后面的转换规则
同2.4。

COMPOUND表示完全按用户指定的字段排序,类似PG的CLUSTER。

INTERLEAVED表示按维度排序,任意列的顺序都是公平的,类似多维空间聚集存放。

参考
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html

https://gpdb.docs.pivotal.io/43300/ref_guide/sql_commands/CREATE_TABLE.html

https://help.aliyun.com/knowledge_detail/59195.html
转自阿里云德哥

上一篇:Serverless架构在软件工程的研发实践—JAMStack原理介绍


下一篇:AWS redshift->hdb pg(Greenplum), 内置函数、数据类型、字符集