Apache Spark 2.4.4 release, open source cluster computing environment

Apache Spark is a kind of  Hadoop  similar open source cluster computing environment, but there are still some differences between the two, these useful differences between the Spark in certain workloads performance was superior, in other words, Spark enable memory distributed data sets, in addition to providing interactive query, it also can optimize iterative workloads.

Version 2.4.4 contains fixes stability maintenance release, which reads as follows:

  • Repair decimal toScalaBigInt / toJavaBigInteger decimal representation is not suitable for long questions
  • Repair PushProjectionThroughUnion Nullability problem
  • Repair From_Avro does not modify the other rows of the variables in native mode
  • Spark 2.4.3 When HiveUDAF 0 rows encountered unexpected throws NPE. Like the other versions, returns NULL after repair
  • PySparkSocket repair synchronization server and JVM thread connection
  • KafkaOffsetRangeCalculator.getRange may reduce offset
  • Cache an uncertain RDD can lead to incorrect results when re-running stage
  • Spark 2.2 introduces LinearSVCModel.setWeightCol method, which is not correct. It was abandoned in 2.4.4, 3.0.0 will be deleted

Details are explained:

https://spark.apache.org/releases/spark-release-2-4-4.html

Guess you like

Origin www.oschina.net/news/109702/spark-2-4-4-released