Prediction(2)R running through Spark/Hadoop Cluster
1. How we Load the Config in R
install.packages("yaml", repos="http://cran.rstudio.com/")
library("yaml")
config = yaml.load_file("config.yaml")
config$spark$home
These codes in Rstudio can be run. And also we can run them directly from shell
> Rscript scripts/WordCount.R
2. Prepare Hadoop Data
Create the Directory
>hadoop fs -mkdir user/carl/sparkR
Upload the file
>cd /home/carl/install/spark-1.4.1-bin-hadoop2.6/examples/src/main/resources
> hadoop fs -put ./people.json /user/carl/sparkR/
3. This RScript Run Great on Hadoop Cluster
#install.packages("yaml", repos="http://cran.rstudio.com/")
library("yaml")
config = yaml.load_file("config.yaml")
spark_home <- config$spark$home
spark_r_location <- paste0(spark_home,"/R/lib")
spark_server <- config$spark$server
library("SparkR", lib.loc = spark_r_location)
sc <- sparkR.init(master = spark_server, appName = "SparkR_Wordcount",
sparkHome = spark_home)
sqlContext <- sparkRSQL.init(sc)
path <- file.path("sparkR/people.json")
peopleDF <- jsonFile(sqlContext, path)
printSchema(peopleDF)
head(peopleDF)
Running great both on RStudio and RScript.
Tips
1. Error Message:
trying to use CRAN without setting a mirror
Solution:
install.packages("yaml", repos="http://cran.rstudio.com/")
Add the repos there will fix the problem.
References:
http://www.mayin.org/ajayshah/KB/R/
http://stackoverflow.com/questions/5272846/how-to-get-parameters-from-config-file-in-r-script
wordcount example
https://github.com/amplab-extras/SparkR-pkg/blob/master/examples/wordcount.R
Prediction(2)R running through Spark/Hadoop Cluster
猜你喜欢
转载自sillycat.iteye.com/blog/2242559
今日推荐
周排行