The second chapter the basic use of impala
1, impala use
1.1, impala-shell syntax
1.1.1 External impala-shell command parameter syntax
不需要进入到impala-shell交互命令行当中即可执行的命令参数
impala-shell when executed later can take a number of parameters:
-h help documentation
impala-shell -h
-r to refresh the entire metadata, when a large amount of data, more consumption of server performance
impala-shell -r
-B deformatter query large amounts of data can improve the performance
--print_header deformatter name display column
--output_delimiter specified delimiter
-v view corresponding version
impala-shell -v -V
-f execute the query file
--query_file specify the query file
cd /export/servers
vim impala-shell.sql
use weblog;
select * from ods_click_pageviews limit 10;
#通过-f 参数来执行执行的查询文件
impala-shell -f impala-shell.sql
-i connected to impalad
--impalad designated impalad to perform the task
-o save the results to a file which go
--output_file specify the output file name
impala-shell -f impala-shell.sql -o hello.txt
-p Displays the query plan
impala-shell -f impala-shell.sql -p
-q Do not use impala-shell query
1.1.2, impala-shell internal command line parameter syntax
After entering the syntax impala-shell command line can be executed
Enter impala-shell:
impala-shell #任意目录
help command
Help documentation
connect command
connect hostname to connect to a machine to perform the above
refresh command
refresh dbname.tablename 增量刷新
, A case where the metadata refresh a table, mainly for refreshing data changes among the data tables inside the hive of
For refreshing the hive where the data changes among the data table inside
refresh mydb.stu;
invalidate metadata 命令:
invalidate metadata全量刷新
, Large performance overhead, which is mainly used when the new hive database or database tables to be refreshed
invalidate metadata
explain the command:
For viewing sql statement execution plan
explain select * from stu;
Explain value may be set to several values 0,1,2,3 the like, where the level 3 is the highest, can print out the most complete information
set explain_level=3;
profile command:
After executing sql statement execution, you can print a more detailed implementation steps,
Mainly used to query results view, excellent tone clusters
select * from stu;
profile;
Note: insert in the hive which window data or new database or database tables, which is not in impala to directly query, the database needs to be refreshed, impala-shell inserted in which data, which can be in impala query directly to no need to refresh the database, which is the catalog to use this feature service implementation, catalog is added after impala1.2 version of the module functions, the main role is to synchronize metadata between impala
1.2, create a database
1.1.1 interactive window into the impala
impala-shell #进入到impala的交互窗口
View all databases 1.1.2
show databases;
1.1.3 Creating and deleting databases
Create a database
CREATE DATABASE IF NOT EXISTS mydb1;
drop database if exists mydb;
1.3, create a database table
Creating student table
CREATE TABLE IF NOT EXISTS mydb1.student (name STRING, age INT, contact INT );
Creating employ table
create table employee (Id INT, name STRING, age INT,address STRING, salary BIGINT);
1.3.1, database insert the data
insert into employee (ID,NAME,AGE,ADDRESS,SALARY)VALUES (1, 'Ramesh', 32, 'Ahmedabad', 20000 );
insert into employee values (2, 'Khilan', 25, 'Delhi', 15000 );
Insert into employee values (3, 'kaushik', 23, 'Kota', 30000 );
Insert into employee values (4, 'Chaitali', 25, 'Mumbai', 35000 );
Insert into employee values (5, 'Hardik', 27, 'Bhopal', 40000 );
Insert into employee values (6, 'Komal', 22, 'MP', 32000 );
Coverage data
Insert overwrite employee values (1, 'Ram', 26, 'Vishakhapatnam', 37000 );
After performing cover, this table only the data a
Another construction of the table statement
create table customer as select * from employee;
1.3.2 Data Query
select * from employee;
select name,age from employee;
1.3.3, delete the table
DROP table mydb1.employee;
1.3.4, empty table data
truncate employee;
1.3.5, create a view
CREATE VIEW IF NOT EXISTS employee_view AS select name, age from employee;
1.3.6, see the view data
select * from employee_view;
1.4, order by statement
Basic grammar
select * from table_name ORDER BY col_name [ASC|DESC] [NULLS FIRST|NULLS LAST]
Select * from employee ORDER BY id asc;
1.5, group by statement
Select name, sum(salary) from employee Group BY name;
1.6, the HAVING statement
Basic grammar
select * from table_name ORDER BY col_name [ASC|DESC] [NULLS FIRST|NULLS LAST]
Tables grouped by age, and selecting the maximum salary for each group, and display of pay greater than 20,000
select max(salary) from employee group by age having max(salary) > 20000
1.7, limit statements
select * from employee order by id limit 4;
2, impala table into which the data in several ways
The first way, the data load hdfs which go to impala
create table user(id int ,name string,age int ) row format delimited fields terminated by "\t";
User.txt prepare data and upload to / user / impala down the path of hdfs
Upload user.txt to hadoop up:
hdfs dfs -put user.txt /user/impala/
See if the upload was successful:
hdfs dfs -ls /user/impala
1 kasha 15
2 fizz 20
3 pheonux 30
4 manzi 50
Download Data
load data inpath '/user/impala/' into table user;
Query data loaded
select * from user;
If the query is not less than the data, then you need to refresh the data table again
refresh user;
The second way:
create table user2 as select * from user;
The third way:
insert into #不推荐使用 因为会产生大量的小文件
Do not put impala as a database to use
The fourth way:
insert into select #用的比较多