大数据平台运维之Hive

启动大数据平台的Hive数据仓库，启动Hvie客户端，通过Hive查看hadoop所有文件路径（相关数据库命令语言请全部使用小写格式），将查询结果以文本形式提交到答题框中。

[root@master ~]# hive

WARNING: Use "yarn jar" to launch YARNapplications.

Logging initialized using configuration infile:/etc/hive/2.4.3.0-227/0/hive-log4j.properties

hive> dfs -ls;

Found 5 items

drwx------ -root hdfs 0 2017-04-20 18:56.Trash

drwxr-xr-x - roothdfs 0 2017-05-07 05:59.hiveJars

drwx------ -root hdfs 0 2017-05-07 05:43.staging

drwxr-xr-x -root hdfs 0 2017-05-07 05:43hbase-staging

drwxr-xr-x -root hdfs 0 2017-04-20 18:56samll-file

27.使用 Hive工具来创建数据表xd_phy_course，将phy_course_xd.txt导入到该表中，其中xd_phy_course表的数据结构如下表所示。导入完成后，通过hive查询数据表xd_phy_course中数据在HDFS所处的文件位置列表信息，将以上操作命令（相关数据库命令语言请全部使用小写格式）和输出结果以文本形式提交到答题框。

新：

hive> create table xd_phy_course (stnamestring,stID int,class string,opt_cour string) row format delimited fieldsterminated by '\t' lines terminated by '\n' stored as textfile;

Time taken: 4.067 seconds

hive> load data local inpath'/root/phy_course_xd.txt' into table xd_phy_course;

Loading data to table default.xd_phy_course

Table default.xd_phy_course stats: [numFiles=1,totalSize=89444]

Time taken: 1.422 seconds

hive> dfs -ls /apps/hive/warehouse;

Found 1 items

drwxrwxrwx -hive hdfs 0 2017-05-19 03:30/apps/hive/warehouse/xd_phy_course

28.使用Hive工具来创建数据表xd_phy_course，并定义该表为外部表，外部存储位置为/1daoyun/data/hive，将phy_course_xd.txt导入到该表中，其中xd_phy_course表的数据结构如下表所示。导入完成后，在hive中查询数据表xd_phy_course的数据结构信息，将以上操作命令（相关数据库命令语言请全部使用小写格式）和输出结果以文本形式提交到答题框。

hive> create external table xd_phy_course (stnamestring,stID int,class string,opt_cour string) row format delimited fieldsterminated by '\t' lines terminated by '\n' location '/1daoyun/data/hive';

Time taken: 1.197 seconds

hive> load data local inpath '/root/phy_course_xd.txt'into table xd_phy_course;

Loading data to table default.xd_phy_course

Table default.xd_phy_course stats: [numFiles=1,totalSize=89444]

Time taken: 0.96 seconds

hive> desc xd_phy_course2;

stname string

stid int

class string

opt_cour string

Time taken: 0.588 seconds, Fetched: 4 row(s)

29.使用Hive工具来查找出phy_course_xd.txt文件中某高校Software_1403班级报名选修volleyball的成员所有信息，其中phy_course_xd.txt文件数据结构如下表所示，选修科目字段为opt_cour，班级字段为class，将以上操作命令（相关数据库命令语言请全部使用小写格式）和输出结果以文本形式提交到答题框。

新：

hive> create table xd_phy_course (stnamestring,stID int,class string,opt_cour string) row format delimited fieldsterminated by '\t' lines terminated by '\n';

Time taken: 4.067 seconds

hive> load data local inpath'/root/phy_course_xd.txt' into table xd_phy_course;

Loading data to table default.xd_phy_course

Table default.xd_phy_course stats: [numFiles=1,totalSize=89444]

Time taken: 1.422 seconds

hive> select * from xd_phy_course whereclass='Software_1403' and opt_cour='volleyball';

student409 10120408 Software_1403 volleyball

student411 10120410 Software_1403 volleyball

student413 10120412 Software_1403 volleyball

student419 10120418 Software_1403 volleyball

student421 10120420 Software_1403 volleyball

student422 10120421 Software_1403 volleyball

student424 10120423 Software_1403 volleyball

student432 10120431 Software_1403 volleyball

student438 10120437 Software_1403 volleyball

student447 10120446 Software_1403 volleyball

Time taken: 0.985 seconds, Fetched: 10 row(s)

30.使用Hive工具来统计phy_course_xd.txt文件中某高校报名选修各个体育科目的总人数，其中phy_course_xd.txt文件数据结构如下表所示，选修科目字段为opt_cour，将统计的结果导入到表phy_opt_count中，通过SELECT语句查询表phy_opt_count内容，将统计语句以及查询命令（相关数据库命令语言请全部使用小写格式）和输出结果以文本形式提交到答题框。

hive> create table xd_phy_course (stname string,stIDint,class string,opt_cour string) row format delimited fields terminated by'\t' lines terminated by '\n';

Time taken: 4.067 seconds

hive> load data local inpath'/root/phy_course_xd.txt' into table xd_phy_course;

Loading data to table default.xd_phy_course

Table default.xd_phy_course stats: [numFiles=1,totalSize=89444]

Time taken: 1.422 seconds

hive> create table phy_opt_count (opt_courstring,cour_count int) row format delimited fields terminated by '\t' linesterminated by '\n';

Time taken: 1.625 seconds

hive> insert overwrite table phy_opt_count selectxd_phy_course.opt_cour,count(distinct xd_phy_course.stID) from xd_phy_coursegroup by xd_phy_course.opt_cour;

Query ID =root_20170507125642_6af22d21-ae88-4daf-a346-4b1cbcd7d9fe

Total jobs = 1

Launching Job 1 out of 1

Tez session was closed. Reopening...

Session re-established.

Status: Running (Executing on YARN cluster with App idapplication_1494149668396_0004)

--------------------------------------------------------------------------------

VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED

--------------------------------------------------------------------------------

Map 1 .......... SUCCEEDED 1 1 0 0 0 0

Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0

--------------------------------------------------------------------------------

VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 4.51 s

--------------------------------------------------------------------------------

Loading data to table default.phy_opt_count

Table default.phy_opt_count stats: [numFiles=1,numRows=10, totalSize=138, rawDataSize=128]

Time taken: 13.634 seconds

hive> select * from phy_opt_count;

badminton 234

basketball 224

football 206

gymnastics 220

opt_cour 0

swimming 234

table tennis 277

taekwondo 222

tennis 223

volleyball 209

Time taken: 0.065 seconds, Fetched: 10 row(s)

31.使用Hive工具来查找出phy_course_score_xd.txt文件中某高校Software_1403班级体育选修成绩在90分以上的成员所有信息，其中phy_course_score_xd.txt文件数据结构如下表所示，选修科目字段为opt_cour，成绩字段为score，将以上操作命令（相关数据库命令语言请全部使用小写格式）和输出结果以文本形式提交到答题框。

hive> create table phy_course_score_xd (stnamestring,stID int,class string,opt_cour string,score float) row format delimitedfields terminated by '\t' lines terminated by '\n';

Time taken: 0.339 seconds

hive> load data local inpath'/root/phy_course_score_xd.txt' into table phy_course_score_xd;

Loading data to table default.phy_course_score_xd

Table default.phy_course_score_xd stats: [numFiles=1,totalSize=1910]

Time taken: 1.061 seconds

hive> select * from phy_course_score_xd whereclass='Software_1403' and score>90;

student433 10120432 Software_1403 football 98.0

student434 10120433 Software_1403 table tennis 97.0

student438 10120437 Software_1403 volleyball 93.0

student439 10120438 Software_1403 badminton 100.0

student444 10120443 Software_1403 swimming 99.0

student445 10120444 Software_1403 table tennis 97.0

student450 10120449 Software_1403 basketball 97.0

Time taken: 0.21 seconds, Fetched: 7 row(s)

32.使用Hive工具来统计phy_course_score_xd.txt文件中某高校各个班级体育课的平均成绩，使用round函数保留两位小数。其中phy_course_score_xd.txt文件数据结构如下表所示，班级字段为class，成绩字段为score，将以上操作命令（相关数据库命令语言请全部使用小写格式）和输出结果以文本形式提交到答题框。

hive> select class,round(avg(score)) fromphy_course_score_xd group by class;

Query ID = root_20170507131823_0bfb1faf-3bfb-42a5-b7eb-3a6a284081ae

Total jobs = 1

Launching Job 1 out of 1

Status: Running (Executing on YARN cluster with App idapplication_1494149668396_0005)

--------------------------------------------------------------------------------

VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED

--------------------------------------------------------------------------------

Map 1 .......... SUCCEEDED 1 1 0 0 0 0

Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0

--------------------------------------------------------------------------------

VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 26.68 s

--------------------------------------------------------------------------------

Network_1401 73.0

Software_1403 72.0

class NULL

Time taken: 27.553 seconds, Fetched: 3 row(s)

33.使用Hive工具来统计phy_course_score_xd.txt文件中某高校各个班级体育课的最高成绩。其中phy_course_score_xd.txt文件数据结构如下表所示，班级字段为class，成绩字段为score，将以上操作命令（相关数据库命令语言请全部使用小写格式）和输出结果以文本形式提交到答题框。

hive> select class,max(score) fromphy_course_score_xd group by class;

Query ID =root_20170507131942_86a2bf55-49ac-4c2e-b18b-8f63191ce349

Total jobs = 1

Launching Job 1 out of 1

Status: Running (Executing on YARN cluster with App idapplication_1494149668396_0005)

--------------------------------------------------------------------------------

VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED

--------------------------------------------------------------------------------

Map 1 .......... SUCCEEDED 1 1 0 0 0 0

Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0

--------------------------------------------------------------------------------

VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 5.08 s

--------------------------------------------------------------------------------

Network_1401 95.0

Software_1403 100.0

class NULL

Time taken: 144.035 seconds, Fetched: 3 row(s)

34.在Hive数据仓库将网络日志weblog_entries.txt中分开的request_date和request_time字段进行合并，并以一个下划线“_”进行分割，如下图所示，其中weblog_entries.txt的数据结构如下表所示。将以上操作命令（相关数据库命令语言请全部使用小写格式）和后十行输出结果以文本形式提交到答题框。

hive> create external table weblog_entries (md5string,url string,request_date string,request_time string,ip string) row formatdelimited fields terminated by '\t' lines terminated by '\n' location'/data/hive/weblog/';

Time taken: 0.502 seconds

hive> load data local inpath'/root/weblog_entries.txt' into table weblog_entries;

Loading data to table default.weblog_entries

Table default.weblog_entries stats: [numFiles=1,totalSize=251130]

Time taken: 1.203 seconds

hive> select concat_ws('_', request_date,request_time) from weblog_entries;

2012-05-10_21:29:01

2012-05-10_21:13:47

2012-05-10_21:12:37

2012-05-10_21:34:20

2012-05-10_21:27:00

2012-05-10_21:33:53

2012-05-10_21:10:19

2012-05-10_21:12:05

2012-05-10_21:25:58

2012-05-10_21:34:28

Time taken: 0.265 seconds, Fetched: 3000 row(s)

35.在Hive数据仓库将网络日志weblog_entries.txt中的IP 字段与ip_to_country中IP对应的国家进行简单的内链接，输出结果如下图所示，其中weblog_entries.txt的数据结构如下表所示。将以上操作命令（相关数据库命令语言请全部使用小写格式）和后十行输出结果以文本形式提交到答题框。

hive> create table ip_to_country (ip string,countrystring) row format delimited fields terminated by '\t' lines terminated by '\n'location '/data/hive/ip_to_county/';

Time taken: 0.425 seconds

hive> load data local inpath'/root/ip_to_country.txt' into table ip_to_country;

Loading data to table default.ip_to_country

Table default.ip_to_country stats: [numFiles=1,totalSize=75728]

Time taken: 2.016 seconds

hive> select wle.*,itc.country from weblog_entrieswle join ip_to_country itc on wle.ip=itc.ip;

Query ID = root_20170507064740_a52870a0-2405-4fd4-85c2-43f8a229b3c3

Total jobs = 1

Launching Job 1 out of 1

Tez session was closed. Reopening...

Session re-established.

Status: Running (Executing on YARN cluster with App idapplication_1494136863427_0002)

--------------------------------------------------------------------------------

VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED

--------------------------------------------------------------------------------

Map 1 .......... SUCCEEDED 1 1 0 0 0 0

Map 2 .......... SUCCEEDED 1 1 0 0 0 0

--------------------------------------------------------------------------------

VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 6.30 s

--------------------------------------------------------------------------------

3e8146764aefe5d87353dd4e0ae9ac5/qnrxlxqacgiudbtfggcg.html 2012-05-10 21:29:01 164.210.124.152 United States

fdb388d28c8466d4eb7d93677af194 /sbbiuot.html 2012-05-10 21:13:47 168.17.158.38 United States

4a1a345f85fa5fa2659e27f623dff11 /ofxi.html 2012-05-10 21:12:37 174.24.173.11 United States

6a09d25407766a7bb8653d359feca4 /hjmdhaoogwqhp.html 2012-05-10 21:34:20 143.64.173.176 United States

aeecff9b31d1134c8843248bedbca5bd /angjbmea.html 2012-05-10 21:27:00 160.164.158.125 Italy

f61954aad39de057cd6f51ba3deed241 /mmdttqsnjfifkihcvqu.html 2012-05-10 21:33:53 15.111.128.4 United States

7cdf2c1efd653867278417dd465c1a65 /eorxuryjadhkiwsf.html 2012-05-10 21:10:19 22.71.176.163 United States

22b2549649dcc284ba8bf7d4993ac62 /e.html2012-05-10 21:12:05 105.57.100.182 Morocco

3ab7888ffe27c2f98d48eb296449d5 /khvc.html 2012-05-10 21:25:58 111.147.83.42 China

65827078a9f7ccce59632263294782db /c.html 2012-05-10 21:34:28 137.157.65.89 Australia

Time taken: 15.331 seconds, Fetched: 3000 row(s)

36.使用Hive动态地关于网络日志weblog_entries.txt的查询结果创建Hive表。通过创建一张名为weblog_entries_url_length的新表来定义新的网络日志数据库的三个字段，分别是url，request_date，request_time。此外，在表中定义一个获取url字符串长度名为“url_length”的新字段，其中weblog_entries.txt的数据结构如下表所示。完成后查询weblog_entries_url_length表文件内容，将以上操作命令（相关数据库命令语言请全部使用小写格式）和后十行输出结果以文本形式提交到答题框。

hive> create tableweblog_entries_url_length as select url, request_date, request_time,length(url) as url_length from weblog_entries;

Query ID = root_20170507065123_e3105d8b-84b6-417f-ab58-21ea15723e0a

Total jobs = 1

Launching Job 1 out of 1

Status: Running (Executing onYARN cluster with App id application_1494136863427_0002)

--------------------------------------------------------------------------------

VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED

--------------------------------------------------------------------------------

Map 1 .......... SUCCEEDED 1 1 0 0 0 0

--------------------------------------------------------------------------------

VERTICES: 01/01 [==========================>>]100% ELAPSED TIME: 4.10 s

--------------------------------------------------------------------------------

Moving data to:hdfs://master:8020/apps/hive/warehouse/weblog_entries_url_length

Tabledefault.weblog_entries_url_length stats: [numFiles=1, numRows=3000,totalSize=121379, rawDataSize=118379]

Time taken: 5.874 seconds

hive> select * fromweblog_entries_url_length;

/qnrxlxqacgiudbtfggcg.html 2012-05-10 21:29:01 26

/sbbiuot.html 2012-05-10 21:13:47 13

/ofxi.html 2012-05-10 21:12:37 10

/hjmdhaoogwqhp.html 2012-05-10 21:34:20 19

/angjbmea.html 2012-05-10 21:27:00 14

/mmdttqsnjfifkihcvqu.html 2012-05-10 21:33:53 25

/eorxuryjadhkiwsf.html 2012-05-10 21:10:19 22

/e.html 2012-05-10 21:12:05 7

/khvc.html 2012-05-10 21:25:58 10

/c.html 2012-05-10 21:34:28 7

Time taken: 0.08 seconds,Fetched: 3000 row(s)

大数据平台运维之Hive

猜你喜欢