.NET for Apache Spark 1.0 version released

.NET for Apache Spark  1.0  has been released, which is a .NET framework for Spark big data, allowing .NET developers to easily use Apache Spark.

The software package is led by Microsoft and the .NET Foundation and has been developed for about two years. At the Spark + AI Summit in 2019, Microsoft announced the launch of .NET for Apache Spark and released the first preview version v0.1.0 .

Version 1.0 includes the following:

  • Support for .NET applications targeting .NET Standard 2.0 (.NET Core 3.1 or higher is recommended).
  • Supports Apache Spark 2.4/3.0 DataFrame API, including the function of writing Spark SQL. E.g:
var spark = SparkSession.Builder().GetOrCreate();
var tweets = spark.Read().Schema("date STRING, time STRING, author STRING, tweet STRING").Format("csv").Load(inputfile);
tweets = tweets.GroupBy(Lower(Col("author")).As("author"))
               .Agg(Count("tweet").As("tweetcount"))
               .OrderBy(Desc("tweetcount"));
tweets.Write().SaveAsTable("tweetcount");
spark.Sql(@"SELECT * FROM tweetcount").show();
  • Ability to write Apache Spark applications using .NET User Defined Functions (UDF). E.g:
// Define and register UDF
var concat = Udf<int?, string, string>((age, name)=>name+age);

// Use UDF
df.Filter(df["age"] > 21).Select(concat(df["age"], df["name"]).Show();
  • Provide API extension framework to add support for other Spark libraries. Currently includes support for  Linux foundation Delta Lake , Microsoft OSS Hyperspace , ML.NET , and  Apache Spark's MLLib functionality  .
  • Performance work for moving data between Spark runtime and .NET UDFs and improved pickling interop and support for Apache Arrow.
  • Competitive advantage: .NET for Apache Spark applications that do not use UDF show the same speed as non-UDF Spark applications based on Scala and PySpark. If the application contains UDF, the .NET for Apache Spark program is at least as fast as the PySpark program, and generally faster. 

Download link: https://www.nuget.org/packages/Microsoft.Spark 

Guess you like

Origin www.oschina.net/news/119553/net-1-0-for-apache-spark-released