Hive case (4)

Case: Python and R, who is more suitable for big data Spark / hadoop and deep learning?

Question 1: Big data spark / hadoop, python and R language, that is used by many people

Prepare the data: the screenshot is as follows, the specific resources are on my upload resources, you can download

Let's implement the above problem

#Create database

CREATE DATABASE db_language

#Create table

CREATE TABLE db_language.tb_language_account(

id_number string,

area string,

python string,

r string,

sql_str string,

rapidminer string,

excel string,

spark string,

mangshe string,

tensorflow string,

scikit_learn string,

string array,

knime string,

deep string,

spark_hadoop string,

ntools int,

votetools string

)

ROW FORMAT  DELIMITED FIELDS TERMINATED BY ','

LINES TERMINATED BY "\n"

 

#Import Data

LOAD DATA LOCAL INPATH '/opt/data/sw17-top11-dl-sh.anon.csv'

INTO TABLE db_language.tb_language_account

# 大 数据 spark / hadoop, how many people use python (683)

#count(),sum(),avg(),max().....

SELECT

count(*) as count

FROM

db_language.tb_language_account

WHERE

python="1" AND spark_hadoop="1";

#大数据spark/hadoop,使用R语言有多少人

SELECT

count(*) as count

FROM

db_language.tb_language_account

WHERE

R="1" AND spark_hadoop="1";

#合并结果:

#count  683 606

SELECT

t1.p_c,t2.r_c

FROM

(SELECT

count(*) as p_c, "1" as id

FROM

db_language.tb_language_account

WHERE

python="1" AND spark_hadoop="1"

)t1

JOIN(

SELECT

count(*) as r_c,"1" as id

FROM

db_language.tb_language_account

WHERE

R="1" AND spark_hadoop="1"

)t2

on

t1.id=t2.id

 

#注解

我的数据资源是放在/opt/data/sw17-top11-dl-sh.anon.csv,以上操作都是在hive中进行的。

#注意

在hive上执行的sql语句别忘了后面的 “;”切记切记

 

 

 

发布了105 篇原创文章 · 获赞 536 · 访问量 7万+

Guess you like

Origin blog.csdn.net/qq_41934990/article/details/81902588