10 simple and useful hive functions you should know first

1. show databases;
Usually your tables are stored in some databases and with this line of code, you could get the names of the databases you have access to.
And all your sql or hive codes should end with ; .

2. use database_name;
Choose one name from the above list and you could enter that database using this line of code.

3. show tables;
You could know get the names of the tables in that database.

4. desc table_name; 
or desc database_name.table_name.
desc is for describe and desc could be replaced by describe.
With this line of code, you could get some details about that particular table_name; 
If you have entered the database using the code (use database_name), then you could directly write (desc table_name), otherwise you should use database_name and one dot . to indicate you would like to describe the table in that particular database. In the future, if you play around several databases, you should always bring the database name with the table name, instead of entering particular databases every time you would like to change databases.
You will be shown the name of the columns in your table and type of the value contained in these columns. And the third column of the result might be some comments for the columns (variables).

For example, we have a table here with 3 columns for students' performance in a subject. If we describe the table, we get the following:


5. set hive.cli.print.header=true;
When you describe one particular table and you are returned with a table with several columns, but you don't know the name of the columns and you have no idea what these values are trying to tell you. Then you could show the header of the tables returned with this line of code. Next time you get a result, the name of the columns will be returned on the first row.

If we describe the above table again, we will get the following with the 3 column names, namely col_name, data_type and the last one, comment.


6. desc function min; 
or describe function min;
Hive provides some functions and you know some of them by googling. Also you could get help directly from hive. For example, you would like to know about the hive function min, you could get the details with (desc function min;). You will get following description about the min function:
min(expr) - Returns the minimum value of expr


7. select * from database_name.table_name limit 10;
Now it is time for you to play with select, the most important word in the sql or hive. Following the select, should be the names of your column. With *, you mean that you would like to select all of the columns, and you must not omit the limit at the end, which indicates that you would like to get only 10 rows from the table. Otherwise if your table_name is very large with many rows, hive will try its best to print all rows out. If the number of rows in the table_name is fewer than 10, it will print all of the rows, otherwise, it will print out 10 rows only.

 


8. select column_1, column_2 from database_name.table_name limit 10;
Now you could specify the columns you would like to have a look at, instead of all the columns using *. Still, do not forget the limit.

For example: select id, grades from database_name.table_name limit 10;


9. select count(*) from database_name.table_name;
You now would like to know how many rows are there in this table and you could achieve the goal by this line of code. If the table has only 100 rows, I may not add limit whenever I select some rows from the table. BTW, adding limit is always a good habit.You could replace the * with 1, or 2 or some other numbers. You may not replace the * with particular column names, unless you would like to know the number of rows where the column value is not NULL.

10. select * from database_name.table_name where (conditions);
Now you would like to select the rows that only meet some requirements. 
For example, from the above table, you would like to know the students whose grades are not equal to 100.

select * from database_name.table_name
where grades != 100;

 



If you would like to create the above table, you could create the table with following codes:

create table database_name.table_name
(
id string
, grades int
, gender string
);

insert into database_name.table_name values(1,100,'Female'), (2,100,'Female'), (4,100,'Male'), (9,99,'Male');

猜你喜欢

转载自blog.csdn.net/henbile/article/details/86716439