0パーティションテーブル

パーティションテーブルは、実際にはHDFSファイルシステム上の独立したフォルダーに対応しており、パーティションのすべてのデータファイルはこのフォルダーの下にあります。Hiveのパーティションはサブディレクトリであり、ビジネスニーズに応じて大きなデータセットを小さなデータセットに分割します。クエリでは、WHERE句の式を使用して、クエリに必要な指定パーティションを選択することで、クエリの効率を大幅に向上させます。

1パーティションテーブルの基本操作

1 ）パーティションテーブルを導入します（ログは日付に従って管理し、部門情報によってシミュレートする必要があります）

dept_20200401.log

dept_20200402.log

dept_20200403.log

2 ）パーティションテーブルの構文を作成する

hive (default)> create table dept_partition(

deptno int, dname string, loc string

)

partitioned by (day string)

row format delimited fields terminated by '\t';

注：パーティションフィールドは、テーブルにすでに存在するデータにすることはできません。パーティションフィールドは、テーブルの疑似列と見なすことができます。

3 ）データをパーティションテーブルにロードします

データの準備

dept_20200401.log

10会計1700

20リサーチ1800

dept_20200402.log

30セールス1900

40回の操作1700

dept_20200403.log

50テスト2000

60 DEV 1900

データのダウンロード

hive (default)> load data local inpath '/opt/module/hive/datas/dept_20200401.log' into table dept_partition partition(day='20200401');

hive (default)> load data local inpath '/opt/module/hive/datas/dept_20200402.log' into table dept_partition partition(day='20200402');

hive (default)> load data local inpath '/opt/module/hive/datas/dept_20200403.log' into table dept_partition partition(day='20200403');

注：パーティションテーブルにデータをロードするときは、パーティションを指定する必要があります

4 ）パーティションテーブルのデータをクエリします

単一パーティションクエリ

hive (default)> select * from dept_partition where day='20200401';

マルチパーティション共同クエリ

hive (default)> select * from dept_partition where day='20200401'

              union

              select * from dept_partition where day='20200402'

              union

              select * from dept_partition where day='20200403';

hive (default)> select * from dept_partition where day='20200401' or

                day='20200402' or day='20200403';

5 ）パーティションを増やします

単一のパーティションを作成する

hive (default)> alter table dept_partition add partition(day='20200404');

同時に複数のパーティションを作成する

hive (default)> alter table dept_partition add partition(day='20200405') partition(day='20200406');

6 ）パーティションを削除します

単一のパーティションを削除します

hive (default)> alter table dept_partition drop partition (day='20200406');

複数のパーティションを同時に削除する

hive (default)> alter table dept_partition drop partition (day='20200404'), partition(day='20200405');

7 ）パーティションテーブルにあるパーティションの数を表示します

hive> show partitions dept_partition;

8 ）パーティションテーブルの構造を表示する

hive> desc formatted dept_partition;

# Partition Information         

# col_name              data_type               comment            

month                   string

2セカンダリパーティション

思考：ログデータを1日で分割する方法は？

1 ）セカンダリパーティションテーブルを作成します

hive (default)> create table dept_partition2(

               deptno int, dname string, loc string

               )

               partitioned by (day string, hour string)

               row format delimited fields terminated by '\t';

2 ）通常の荷重データ

（1）セカンダリパーティションテーブルにデータをロードします

hive (default)> load data local inpath '/opt/module/hive/datas/dept_20200401.log' into table

dept_partition2 partition(day='20200401', hour='12');

（2）パーティションデータのクエリ

hive (default)> select * from dept_partition2 where day='20200401' and hour='12';

3 ）ディレクトリパーティションに直接アップロードされたデータ。これにより、パーティションテーブルと関連データが3つの方法で生成されます。

（1）方法1：データアップロード後の修復

データをアップロードする

hive (default)> dfs -mkdir -p

 /user/hive/warehouse/mydb.db/dept_partition2/day=20200401/hour=13;

hive (default)> dfs -put /opt/module/datas/dept_20200401.log  /user/hive/warehouse/mydb.db/dept_partition2/day=20200401/hour=13;

データのクエリ（アップロードしたばかりのデータはクエリできません）

hive (default)> select * from dept_partition2 where day='20200401' and hour='13';

修復コマンドを実行する

hive> msck repair table dept_partition2;

データを再度クエリする

hive (default)> select * from dept_partition2 where day='20200401' and hour='13';

（2）方法2：データをアップロードした後にパーティションを追加する

データをアップロードする

hive (default)> dfs -mkdir -p

 /user/hive/warehouse/mydb.db/dept_partition2/day=20200401/hour=14;

hive (default)> dfs -put /opt/module/hive/datas/dept_20200401.log  /user/hive/warehouse/mydb.db/dept_partition2/day=20200401/hour=14;

パーティションの追加を実行します

  hive (default)> alter table dept_partition2 add partition(day='201709',hour='14');

クエリデータ

hive (default)> select * from dept_partition2 where day='20200401' and hour='14';

（3）方法3：フォルダを作成した後、パーティションにデータをロードします

ディレクトリを作成する

hive (default)> dfs -mkdir -p

 /user/hive/warehouse/mydb.db/dept_partition2/day=20200401/hour=15;

データをアップロードする

hive (default)> load data local inpath '/opt/module/hive/datas/dept_20200401.log' into table

 dept_partition2 partition(day='20200401',hour='15');

クエリデータ

hive (default)> select * from dept_partition2 where day='20200401' and hour='15';

3動的パーティション調整

リレーショナルデータベースでは、パーティションテーブルにデータを挿入すると、データベースはパーティションフィールドの値に基づいて、対応するパーティションにデータを自動的に挿入します。Hiveも同様のメカニズム、つまり動的パーティション（動的パーティション）を提供しますが、 Hiveの動的パーティションを使用するには、それに応じて構成する必要があります。

1 ）動的パーティションパラメータ設定を開きます

（1）動的パーティション関数を開きます（デフォルトはtrue、open）

hive.exec.dynamic.partition=true

（2）非厳密モードに設定します（動的パーティションモード。デフォルトは厳密です。これは、少なくとも1つのパーティションを静的パーティションとして指定する必要があることを意味します。非厳密モードは、すべてのパーティションフィールドで動的パーティションの使用が許可されることを意味します。。）

hive.exec.dynamic.partition.mode=nonstrict

（3）MRを実行するすべてのノードで、最大でいくつの動的パーティションを作成できますか？デフォルト1000

hive.exec.max.dynamic.partitions=1000

（4）MRを実行する各ノードで、最大でいくつの動的パーティションを作成できます。このパラメータは、実際のデータに従って設定する必要があります。たとえば、ソースデータに1年のデータが含まれている場合、つまり、日フィールドに365の値がある場合、このパラメーターは365より大きい値に設定する必要があります。デフォルト値の100を使用すると、エラーが報告されます。

hive.exec.max.dynamic.partitions.pernode=100

（5）MRジョブ全体で作成できるHDFSファイルの最大数。デフォルト100000

hive.exec.max.created.files=100000

（6）空のパーティションが生成されたときに、例外がスローされるかどうか。通常は設定する必要はありません。デフォルトはfalse

hive.error.on.empty.partition=false

2 ）ケースプラクティス

要件：deptテーブルのデータを、リージョン（locフィールド）に従ってターゲットテーブルdept_partitionの対応するパーティションに挿入します。

（1）ターゲットパーティションテーブルを作成します

hive (default)> create table dept_partition_dy(id int, name string) partitioned by (loc int) row format delimited fields terminated by '\t';

（2）動的パーティションを設定する

set hive.exec.dynamic.partition.mode = nonstrict;

hive (default)> insert into table dept_partition_dy partition(loc) select deptno, dname, loc from dept;

（3）ターゲットパーティションテーブルのパーティション状況を表示する

hive (default)> show partitions dept_partition;

思考：ターゲットパーティションテーブルはパーティションフィールドとどのように一致しますか？

Hiveデザインモードでのパーティションテーブルの実際の戦闘

0パーティションテーブル

1パーティションテーブルの基本操作

2セカンダリパーティション

3動的パーティション調整

おすすめ