Hadoop data analysis: based on the java language, the compressed packages are all jar packages
map scatter format reduce statistics
cd opt-->cd hadoop-2.9.0-->cd share-->cd hadoop-->cd mapreduce
Under mapreduce:
hadoop jar hadoop-mapreduce-examples-2.7.5jar wordcount /user/hadoop/haha /user/output
Data analysis has always been mapreduce, wordcount
View the result: hadoop fs -cat /user/output/part-r-00000
Scala language data analysis: the syntax is concise, can be written in one line, is based on java,
Command: spark-shell
lines.map(x=>x.split(' ')).flatMap(x=>for(i<-x) yield(i,1).groupByKey().map(x=>(x._1,x._2.sum)).collect
Python language data analysis: rich python function library, low efficiency
Command: pyspark