Those external frameworks of Spark

Abstract: The Spark community provides a large number of frameworks and libraries. Its size and number are still increasing. In this article we will introduce various external frameworks that are not included in the Spark core source code repository. The problems Spark is trying to solve are broad and span many different domains, and using these frameworks can help reduce initial development costs and leverage the knowledge developers already have.

Spark Package

  To use the Spark library, the first thing you must understand is the Spark package. It's kind of like Spark's package manager. When you submit a job to a Spark cluster, you can download any package from the website where Spark packages are stored. All packages are stored on this site.
  http://spark-packages.org/
  When you want to use a Spark package, you can add the package option to the spark-submit command or the spark-shell command:

$ $Spark_HOME/bin/Spark-shell \ -packages com.databricks :Spark-avro_2.10:2.0.1
Click and drag to move
  If the --packages option is used, the Spark package will automatically add its JAR package to the path you specify. Not only can you use community libraries on your Spark cluster, but you can also publish your own libraries publicly. If you want to publish a Spark package to this hosting service, the following rules must be followed: The

source code must be on Github.
The name of the repository must be the same as the package name.
The master branch of the codebase must have a README.md file, and there must be a LICENSE file in the root directory.
In other words, you don't need to compile your own package. Even if you use Spark Packages templates, compilation, release, and version updates will be done by this service. The sbt plugin sbt-spark-package (https://github.com/databricks/sbt-spark-packages) is also very useful for generating packages. If you want to include this plugin in your project, be sure to write the following code in the project/plugins.sbt file of your sbt project:

resolvers += "bintray-Spark-packages" at "https://dl.bintray. com/Spark-packages/maven/" addSbtPlugin("org.Spark-packages" % "sbt-Spark-packages" % "0.2.3")   click
and drag to move
They are written in build.sbt:

spName - the name of the package.
sparkVersion - The Spark version the package depends on.
sparkComponents - List of Spark components that the package depends on, such as SQL, MLlib.
spShortDescription - a one-sentence description of the package.
spDescription - a full description of the package.
spHomePage - The URL of the web page used to describe the package.
The above 6 items are the information you need to provide before publishing the package. Be sure to publish to the master branch of the package's codebase. You can do this using the Web UI of the Spark package hosting site (https://spark-packages.org/).
            Image description
  After registering a Github account on the Spark package site, you can select your repository from the "name" drop-down menu.
             Image description The short description and homepage
  above should preferably match the description and homepage URL in build.sbt. Once you submit the package, the verification process begins. This process usually takes a few minutes. When the verification is complete, you will receive an email telling you if the verification was successful. If successful, you can download your package with the --package option described earlier. As of November 2015, there are 153 packages on the Spark package site. The next section will introduce some libraries that also support Spark package format, i.e. they are also distributed in Spark package format.

​​​​​​​Original

link http://click.aliyun.com/m/23423/

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326299820&siteId=291194637