Chapter impala Basics

The second chapter the basic use of impala

1, impala use

1.1, impala-shell syntax

1.1.1 External impala-shell command parameter syntax

不需要进入到impala-shell交互命令行当中即可执行的命令参数

impala-shell when executed later can take a number of parameters:

-h help documentation

impala-shell -h

1563107967759

-r to refresh the entire metadata, when a large amount of data, more consumption of server performance

impala-shell -r

1563108023302

-B deformatter query large amounts of data can improve the performance
--print_header deformatter name display column
--output_delimiter specified delimiter
-v view corresponding version

impala-shell -v -V

1563108079152

-f execute the query file
--query_file specify the query file

cd /export/servers
vim impala-shell.sql
use weblog;
select * from ods_click_pageviews limit 10;

#通过-f 参数来执行执行的查询文件
impala-shell -f impala-shell.sql

1563108116213

-i connected to impalad

--impalad designated impalad to perform the task

-o save the results to a file which go

--output_file specify the output file name

impala-shell -f impala-shell.sql -o hello.txt

1563108170822

-p Displays the query plan

impala-shell -f impala-shell.sql -p

1563108200641

-q Do not use impala-shell query

1563108215490

1.1.2, impala-shell internal command line parameter syntax

After entering the syntax impala-shell command line can be executed

Enter impala-shell:

impala-shell  #任意目录

help command

Help documentation

1563108279416

connect command

connect hostname to connect to a machine to perform the above

1563108302182

refresh command

refresh dbname.tablename 增量刷新, A case where the metadata refresh a table, mainly for refreshing data changes among the data tables inside the hive of

For refreshing the hive where the data changes among the data table inside

refresh mydb.stu;

1563108369961

invalidate metadata 命令:

invalidate metadata全量刷新, Large performance overhead, which is mainly used when the new hive database or database tables to be refreshed

invalidate metadata

1563108429754

explain the command:

For viewing sql statement execution plan

explain select * from stu;

1563108474061

Explain value may be set to several values ​​0,1,2,3 the like, where the level 3 is the highest, can print out the most complete information

set explain_level=3;

1563108488296

profile command:

After executing sql statement execution, you can print a more detailed implementation steps,

Mainly used to query results view, excellent tone clusters

select * from stu;
profile;

1563108549570

1563108553592

Note: insert in the hive which window data or new database or database tables, which is not in impala to directly query, the database needs to be refreshed, impala-shell inserted in which data, which can be in impala query directly to no need to refresh the database, which is the catalog to use this feature service implementation, catalog is added after impala1.2 version of the module functions, the main role is to synchronize metadata between impala

1.2, create a database

1.1.1 interactive window into the impala

impala-shell #进入到impala的交互窗口

View all databases 1.1.2

show databases;

1.1.3 Creating and deleting databases

Create a database

CREATE DATABASE IF NOT EXISTS mydb1;
drop database  if exists  mydb;

1.3, create a database table

Creating student table

CREATE TABLE IF NOT EXISTS mydb1.student (name STRING, age INT, contact INT );

Creating employ table

create table employee (Id INT, name STRING, age INT,address STRING, salary BIGINT);

1.3.1, database insert the data

insert into employee (ID,NAME,AGE,ADDRESS,SALARY)VALUES (1, 'Ramesh', 32, 'Ahmedabad', 20000 );
insert into employee values (2, 'Khilan', 25, 'Delhi', 15000 );
Insert into employee values (3, 'kaushik', 23, 'Kota', 30000 );
Insert into employee values (4, 'Chaitali', 25, 'Mumbai', 35000 );
Insert into employee values (5, 'Hardik', 27, 'Bhopal', 40000 );
Insert into employee values (6, 'Komal', 22, 'MP', 32000 );

Coverage data

Insert overwrite employee values (1, 'Ram', 26, 'Vishakhapatnam', 37000 );

After performing cover, this table only the data a

Another construction of the table statement

create table customer as select * from employee;

1.3.2 Data Query

select * from employee;
select name,age from employee; 

1.3.3, delete the table

DROP table  mydb1.employee;

1.3.4, empty table data

truncate  employee;

1.3.5, create a view

CREATE VIEW IF NOT EXISTS employee_view AS select name, age from employee;

1.3.6, see the view data

select * from employee_view;

1.4, order by statement

Basic grammar

select * from table_name ORDER BY col_name [ASC|DESC] [NULLS FIRST|NULLS LAST]
Select * from employee ORDER BY id asc;

1.5, group by statement

Select name, sum(salary) from employee Group BY name; 

1.6, the HAVING statement

Basic grammar

select * from table_name ORDER BY col_name [ASC|DESC] [NULLS FIRST|NULLS LAST]

Tables grouped by age, and selecting the maximum salary for each group, and display of pay greater than 20,000

select max(salary) from employee group by age having max(salary) > 20000

1.7, limit statements

select * from employee order by id limit 4;

2, impala table into which the data in several ways

The first way, the data load hdfs which go to impala

create table user(id int ,name string,age int ) row format delimited fields terminated by "\t";

User.txt prepare data and upload to / user / impala down the path of hdfs

Upload user.txt to hadoop up:

hdfs dfs -put user.txt /user/impala/

See if the upload was successful:

hdfs dfs -ls /user/impala
1       kasha   15
2       fizz        20
3       pheonux    30
4       manzi  50

Download Data

load data inpath '/user/impala/' into table user;

Query data loaded

select  *  from  user;

If the query is not less than the data, then you need to refresh the data table again

refresh  user;

The second way:

create  table  user2   as   select * from  user;

The third way:

insert  into  #不推荐使用 因为会产生大量的小文件

Do not put impala as a database to use

The fourth way:

insert  into  select  #用的比较多

Guess you like

Origin www.cnblogs.com/-xiaoyu-/p/11186672.html