hive practice homework 1

1. How to turn off the firewall?
First, switch to the root user
su-root.
Then, turn off the firewall and prohibit the firewall from starting
systemctl stop firewalld.service
systemctl disable firewalld.service

2. When changing the IP, which file needs to be changed?
vim /etc/sysconfig/network-scripts/ifcfg-ens33
modify the fourth line to BOOTPROTO="static"

3. What are the complex data types in hive?
struct, map, array

4. Try to write the complete syntax for creating tables in hive

create [external] table if not exists 表名
(列名 数据类型 [comment 本列注释],...)
[comment 表注释]
[partitioned by (列名 数据类型 [comment 本列注释],...)]
[clustered by(列名,列名,...)]
[sorted by (列名 [asc|desc],...)] info num_buckets buckets]
[row format row_format]
[stored as file_format]
[location hdfs_path]
[tblproperties (property_name=property_value,...)]
[as select_statement]

Note:
①external means to create an external table; when hive creates an internal table, the data will be moved to the path pointed to by the data warehouse; if an external table is created,
only the path where the data is located is recorded, and no changes are made to the location of the data
②partitioned by means to create a partition Table
③clustered by to create a bucket table
④sorted by not commonly used
⑤row format delimited [fields terminated by char] [collection items terminated
by char] [map keys terminated by char] [line terminated by char]
⑥stored as designated file storage type (sequencefile binary file , Textfile text file, rcfile column storage format)
⑦location specifies the storage location of the table on hdfs
⑧like allows users to copy the existing table structure, but does not copy the data
⑨as followed by query statements, create tables based on the query results

5. The difference between internal and external tables The tables
created at present are all so-called management tables, sometimes called internal tables , because Hive will (more or less) control the life cycle of the data because of this type of table. When deleting a management table, Hive will also delete the data in this table. The management table is inconvenient to share data with other work.

  1. Explain with the keyword external
  2. Specify the path of the data stored in the external table
  3. If you do not specify the storage path of the external table, Hive will create
    a folder with the name of the external table under the /user/hive/warehouse folder on HDFS , and store all the data belonging to this table here
  4. When an external table is deleted, only the metadata information of the table is deleted, but the data is not deleted.
  5. Generally create external tables to store data in production

6. How to clear the data
truncate table table name in the hive table ;

7. What is the difference between a static partition table and a dynamic partition table?
The main difference between static partitioning and dynamic partitioning is that static partitioning is manually specified, while dynamic partitioning is judged by data;
the columns of static partitioning are specified by the user passing column names at compile time; dynamic partitioning can only be executed in SQL It can be decided only when.

8. Hive's strict mode restricts which types of queries
1) For partitioned tables, there must be partition filter conditions after where;
2) For queries that use order by statements, limit queries must be used to limit the query;
3) Cartesian products are restricted 'S query

9. How to import data from HDFS into hive table

load data inpath 'HDFS上的路径' into table 表名 [partition(partcol1=val1,....)];

Guess you like

Origin blog.csdn.net/weixin_42224488/article/details/109381347