greenplum key distribution

gp distribution of key main function is to avoid data skew:

  1. The distribution key must specify on their own initiative, not use the default distribution key (construction of the table statement did not write distribution key)

  2. The distribution key data must be capable of evenly distributed to each node

I have done data skew operation

  Environment: the test environment, the distribution of key construction of the table to date, there is no table of compression

  Status: Data extraction number (to build my own table), the test data warehouse only to the day, the amount of data billions of dollars

  Result: a hate all the data to a node, representing the storage space of 1.6T

  Impact: bounced off two nodes, this table when the dead run queries, truncate operation also ran dead

1. The table is provided for storing born: gp distribution key table should be done to make the distribution function of the data, with the primary key field, or the primary key field as a distribution key.

2. In order to calculate the table is born: the distribution of key gp table should do the associated key for the distribution of key functions should not be cut more than three, this is for efficiency considerations.

 

Guess you like

Origin www.cnblogs.com/zhaoqian49/p/11983445.html