spark broadcast variable

spark broadcast variable

Broadcast broad understanding of variables:
for example, there are 10 Chinese characters like you do not know what to read, you need to look up the dictionary to check a dictionary to write a Chinese character alphabet, the final 10 characters and spelling check is complete, the end result of the program run output, but If you only give half of it this dictionary? there is a risk that some Chinese characters with half finding out this dictionary. resulting in abnormal data, so there have been broadcasting variables of this technology, all integrated into a part of the dictionary complete dictionary queries, thus ensuring the normal output data.
then simply point to understand is that there are five executor, executor to Driver1 dollars each, and then to the executor Driver five dollars per person!
combine the code to understand:
large amounts of data usually exist on hdfs, hdfs is a distributed storage system, the file will be divided into several parts store, when you use it when possible executer end only the data you provide to a machine part, so we have to integrate him The main three steps.

Step :( data distribution) .collect collected Driver integral end,

//整理ip规则数据
val ipdic: RDD[(Long, Long, String, String)] = ipdict.map(line => {
  val fields: Array[String] = line.split("[|]")
  val startnum: Long = fields(2).toLong
  val endnum: Long = fields(3).toLong
  val province = fields(6)
  var city = fields(7)
  (startnum, endnum, province, city)
})
//触发Action,只要你将数据一收集数据就会自动动driver端!!将数据收集到Driver端
val tupipddic: Array[(Long, Long, String, String)] = ipdic.collect().

Step two: sc.broadcast (just the phone to the data Drive end) broadcast radio pass in a parameter ().

//然后再广播,广播也是有广播域的就是你刚刚收集到driver端的数据!他会将driver的数据发送给各个ececutor端,发不完不会停止
val broadcastdic: Broadcast[Array[(Long, Long, String, String)]] = sc.broadcast(tupipddic)

The third step: then broadcast.value get a good overall broadcast data

  //关联规则在executor端执行,是通过广播返回到Driver端的引用,获取完整的ip字典
      val valu: Array[(Long, Long, String, String)] = broadcastdic.value

Step four: then level data can be used to integrate good operation myself!

Published 48 original articles · won praise 11 · views 1511

Guess you like

Origin blog.csdn.net/weixin_45896475/article/details/104382849