05.03 mapreduce data analysis

Hadoop data analysis: based on the java language, the compressed packages are all jar packages

map scatter format reduce statistics

cd opt-->cd hadoop-2.9.0-->cd share-->cd hadoop-->cd mapreduce

Under mapreduce:

hadoop jar hadoop-mapreduce-examples-2.7.5jar wordcount  /user/hadoop/haha  /user/output

Data analysis has always been mapreduce, wordcount

View the result: hadoop fs -cat /user/output/part-r-00000


Scala language data analysis: the syntax is concise, can be written in one line, is based on java,

Command: spark-shell 

lines.map(x=>x.split(' ')).flatMap(x=>for(i<-x) yield(i,1).groupByKey().map(x=>(x._1,x._2.sum)).collect




Python language data analysis: rich python function library, low efficiency

Command: pyspark





Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325606955&siteId=291194637