ES提供了支持包来方便的操作ES。首先添加ES的依赖maven:
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark-20_2.11</artifactId>
<version>6.2.0</version>
<exclusions>
<exclusion>
<artifactId>log4j-over-slf4j</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
</exclusions>
</dependency>
然后在Spark程序中设置SparkConf,将ES的属性设置好:
val sparkconf = new SparkConf().setAppName("sevs_spark3")
.set("spark.driver.userClassPathFirst", "true")
.set("spark.executor.userClassPathFirst", "true")
.set("HADOOP_USER_NAME", getProp("hbase.hadoop.username"))
.set("HADOOP_GROUP_NAME", getProp("hbase.hadoop.groupname"))
.set("es.index.auto.create", "true")
.set("es.nodes", "127.0.0.1")
.set("es.port", "9200")
.setMaster("local")
最后通过esRDD来读写ES,非常方便
def read_es(sc:SparkContext){
val rdd = sc.esRDD("test/login")
rdd.foreach(x=>{
println("######",x._1,x._2 )
})
}
def save_es(sc:SparkContext){
sc.parallelize(Seq("abc","def")).map(x=>{
val map = Map("hostIp" -> x, "remoteIp" -> x.concat("#"))
map
}).saveToEs("snprime_login/login")
}
Spark操作ES就是这么简单,赶快来试试吧。