Big Data technologies summer internships seven ___ Internet Marketing accurate decisions (load data source)

1. Enter Hadoop environment (in order to run Hadoop installation directory, if you configure ssh command can be run directly start)

2. Start hive process (as configured online tutorial to rain or woods can no longer go into details) 

  Into shell

3. Load data into the hive database (in the practical operation of the project is not recommended for the query select *, but should be based on a query column name, if only to see table structure and data results, we recommend adding limit, otherwise the machine you want to collapse ~~ chacara card)

Hive> Show the Tables; see Table ## 
Hive > desc formatted hive_table; ## description information hive_table desc 
Hive > the ALTER the Table table name rename to new name; ## to change the name of the table 
Hive > the ALTER the Table table add columns (Column Name Type ); ## increased column 
hive > table ALTER table id test_id Change int ; ## id column name is modified test_id 
hive > table ALTER table id test_id Change   Double After age; ## test_id renamed id and later age into 
hive > alter table columns table name replace (CC   int , BB String , ID int ); alternatively ## columns (columns modifications and alternative full table) 
Hive >truncate table stu_info; ## Clear data Clear only truncate data tables 
Hive > drop stu_test the Table; Metadata ## delete tables and tables of information 
Hive > drop Database hive_drop; # delete database 
Hive > drop Database hive_test CASCADE; there are ## deleted database table 
Hive > Dese function the when; ## View function uses 
Hive > Dese function Extended Case ; ## view detailed usage function
hive basic database operations
. 1 . 
Action Table 
use Kettle; 
Create Table action1 ( 
user_id String , 
goods_id String , 
user_action int , 
deal_month String , 
deal_day String ) 
Row terminated by the format DELIMITED Fields ' , '   ;     // separated by commas 

loading local data 
Load the inpath local Data ' path ' iNTO table action1; 

combined deal_month deal_day is deal_time, by " - " splicing, save the new table to the 
Create table Action aS  SELECTuser_id, goods_id, user_action, the concat (deal_month, ' - ' , deal_day) AS deal_time from action1; 

Check new table structure 
desc action; 

see data 
SELECT * from table limit 100 ; 
 
export table data to a local file 
INSERT Overwrite local Directory ' local path ' Row format DELIMITED Fields terminated by ' , '  the SELECT * from table name; // quotes path do not include spaces, the exported file can only save files exported under this path, the next time the export file will be overwritten 

2 . 
table sail_info 
Note coding, selection coding structure (typically scv when transformed into UTF TXT - . 8),否则会有乱码

建表
create table sail_info ( goods_id string,goods_name string,goods_property string,store_name string, store_id string,goods_url string,goods_price float,keyword string,sail_count int,good_rate int,brand string,model string,color string,time_to_market string,operate_system string)
row format delimited fields terminated by ','  ;

Delete table row in a column is empty, re-store 
the Create the Table IF not the table name EXISTS AS  the SELECT * from name the WHERE length (column name)> 1 ; 

3 . 
User.info table 

to create 
the Create the Table IF not EXISTS default .user_info (userid String , username String , address String , Gender String , Birthday String ) Row DELIMITED Fields terminated by the format ' \ T ' ; 

the date stored in another table into sub Age 
Create table IF Not EXISTS user_info_ageas select userid,username,address,gender,round(datediff('2019-9-8 15:00:00',regexp_replace(concat(birthday,'15:00:00'),"\""," "))/365) from user_info limit 50

导出
insert overwrite local directory '/home/hadoop/data' row format delimited fields terminated by ',' select * from user_info_age;

年龄区段
select userid,age, case when (age<18) then '1' when (18<=age<=24) then '2' when (25<=age<=29) then '3' else '7' end as regin from user_info_age;

去掉空行:
create table if not exists name as select * from name where length(lie)>1;

年龄区段转存新表
create table if not exists user_info_regin as select userid,username,address,gender,case when (age<18) then '1' when (age between 18 and 24) then '2' when (age between 25 and 29) then '3' when (age between 30 and 34) then '4' when (age between 35 and 39) then '5' when (age between 40 and 49) The then ' . 6 '  the else  ' . 7 ' End AS REGIN from user_info_age_true; 

of age-based categories 
Create Table IF Not EXISTS user_info_regin_alias AS  SELECT the userid, username, address, Gender, REGIN, Case When (REGIN = . 1 ) the then ' 18 is the age ' When (REGIN = 2 ) the then ' between 18 and 24 years old ' When (REGIN = . 3 ) the then ' between 25 to 29 years old ' When (REGIN = . 4 ) the then ' between 30 and 34 years' When (REGIN = . 5 ) the then ' between 35 to 39 years ' When (REGIN = . 6 ) the then ' between 40 and 49 '  the else  ' over 50 years ' End AS   user_age_regin_alias from   user_info_regin; 

delete user_info_regin_alias the userid field "" 
Create Table IF Not EXISTS USER_INFO AS  SELECT REGEXP_REPLACE (the userid, " \" " , " " ), username, address, Gender, REGIN, user_age_regin_alias from user_info_regin_alias; 
 
connected user table to obtain the comment field userrank
Create Table IF not exists userinfo as select user_info.userid,user_info.username,user_info.address,user_info.gender,user_info.regin,user_info.user_age_regin_alias,comment_ture.userrank from user_info join comment_ture on user_info.userid = comment_ture.userid;
Data source load

And a data processing result table structure effect is as follows:

 

 

 

Guess you like

Origin www.cnblogs.com/wjwjs/p/11504244.html