Tuning of the sequence of SPARK

  • Serialization refers to the conversion target sequence of bytes; and deserialization refers to the sequence of bytes of the recovery process of the object

    • Persistent data, the data can be preserved by the sequence of the hard disk to be permanently

    • Remote communication, i.e., a sequence of bytes transmitted on the network object

  • In Spark, there are three main places involving serialization and de-serialization

    • When used in the broadcast operator broadcasting the variable, the variable will be the sequence of network transmission

    • When the type of custom object as the generic type of RDD, all custom type object will be serialized, it requires custom class must implement Serializable

    • When using a sequence of persistence strategy, Spark RDD Each partition will have the sequence into a large array of bytes

Guess you like

Origin www.cnblogs.com/xiangyuguan/p/11361619.html