Case: Python and R, who is more suitable for big data Spark / hadoop and deep learning?
Question 1: Big data spark / hadoop, python and R language, that is used by many people
Prepare the data: the screenshot is as follows, the specific resources are on my upload resources, you can download
Let's implement the above problem
#Create database
CREATE DATABASE db_language |
#Create table
CREATE TABLE db_language.tb_language_account( id_number string, area string, python string, r string, sql_str string, rapidminer string, excel string, spark string, mangshe string, tensorflow string, scikit_learn string, string array, knime string, deep string, spark_hadoop string, ntools int, votetools string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY "\n"
|
#Import Data
LOAD DATA LOCAL INPATH '/opt/data/sw17-top11-dl-sh.anon.csv' INTO TABLE db_language.tb_language_account |
# 大 数据 spark / hadoop, how many people use python (683)
#count(),sum(),avg(),max().....
SELECT count(*) as count FROM db_language.tb_language_account WHERE python="1" AND spark_hadoop="1"; |
#大数据spark/hadoop,使用R语言有多少人
SELECT count(*) as count FROM db_language.tb_language_account WHERE R="1" AND spark_hadoop="1"; |
#合并结果:
#count 683 606
SELECT t1.p_c,t2.r_c FROM (SELECT count(*) as p_c, "1" as id FROM db_language.tb_language_account WHERE python="1" AND spark_hadoop="1" )t1 JOIN( SELECT count(*) as r_c,"1" as id FROM db_language.tb_language_account WHERE R="1" AND spark_hadoop="1" )t2 on t1.id=t2.id |
#注解
我的数据资源是放在/opt/data/sw17-top11-dl-sh.anon.csv,以上操作都是在hive中进行的。
#注意
在hive上执行的sql语句别忘了后面的 “;”切记切记