I. Introduction
When submitted to the server cluster running Storm Topology, first need to package the project. The main comparative analysis of various paper packaging way, and package items that need attention process will be described. There are three main ways packages:
- The first: without any plug-ins, direct use mvn package packing;
- The second: using maven-assembly-plugin package your plug;
- Third: using maven-shade-plugin packaged.
The following are described in detail.
二、mvn package
2.1 mvn package limitations
POM not plug any configuration, direct use mvn package
for packaging items, it is possible for no external dependencies of the project.
But if the project uses a third-party JAR package, there will be a problem, because the mvn package
JAR after the package does not contain is dependent on the package, if you commit at this time to run on the server, there will not find a third-party dependent abnormal .
If you want to package this way, but it uses a third-party JAR, there is no solution? The answer is yes, which is the official document of the Command Line Client has to explain the chapter, the main solution is as follows.
2.2 Solution
In use storm jar
when submitting Topology, you can use the following third-party dependent manner specified:
- If a third-party JAR package locally, you can use
--jars
to specify; - If a third-party remote central warehouse in JAR packages can be used
--artifacts
to specify at this time if you want to exclude certain dependencies can use^
symbols. After specifying the Storm will automatically be downloaded to a central warehouse and then cached locally; - If a third-party JAR package in other warehouses, you also need
--artifactRepositories
to specify the repository address, name and address of the library to use^
delimited.
The following is an example of a command containing the above three conditions:
./bin/storm jar example/storm-starter/storm-starter-topologies-*.jar \
org.apache.storm.starter.RollingTopWords blobstore-remote2 remote \
--jars "./external/storm-redis/storm-redis-1.1.0.jar,./external/storm-kafka/storm-kafka-1.1.0.jar" \
--artifacts "redis.clients:jedis:2.9.0,org.apache.kafka:kafka_2.10:0.8.2.2^org.slf4j:slf4j-log4j12" \
--artifactRepositories "jboss-repository^http://repository.jboss.com/maven2, \
HDPRepo^http://repo.hortonworks.com/content/groups/public/"
复制代码
This approach is based on a case where you can connect to the external network, if you can not connect to the server outside the network, or if you want to be able to project a directly packaged ALL IN ONE
in the JAR, i.e., contains all the relevant dependent, can be used at this time are described below the two plug-ins.
Three, maven-assembly-plugin plugin
maven-assembly-plugin package method is described in official documents, from official documents: Running ON A Production's Cluster Topologies
If you're using Maven, the Maven Assembly Plugin can do the packaging for you. Just add this to your pom.xml:
<plugin> <artifactId>maven-assembly-plugin</artifactId> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> <archive> <manifest> <mainClass>com.path.to.main.Class</mainClass> </manifest> </archive> </configuration> </plugin> 复制代码
Then run mvn assembly:assembly to get an appropriately packaged jar. Make sure you exclude the Storm jars since the cluster already has Storm on the classpath.
Official documentation describes the following main points:
- Use maven-assembly-plugin can put together all the dependencies are driven into the final JAR;
- The need to exclude Storm jars Storm cluster environment has been provided;
- By
<mainClass>
designated primary inlet class label; - By
<descriptorRef>
tag specifies packaging configuration.
jar-with-dependencies
Maven is a pre-defined a basic configuration package, which is an XML file as follows:
<assembly xmlns="http://maven.apache.org/ASSEMBLY/2.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/ASSEMBLY/2.0.0
http://maven.apache.org/xsd/assembly-2.0.0.xsd">
<id>jar-with-dependencies</id>
<formats>
<format>jar</format>
</formats>
<includeBaseDirectory>false</includeBaseDirectory>
<dependencySets>
<dependencySet>
<outputDirectory>/</outputDirectory>
<useProjectArtifact>true</useProjectArtifact>
<unpack>true</unpack>
<scope>runtime</scope>
</dependencySet>
</dependencySets>
</assembly>
复制代码
We can expand through the configuration file in order to achieve more functionality, such as the exclusion of specified JAR and so on. Use example:
1. The introduction of plug-ins
POM.xml introduced in the plug, and specifies the packing format of configuration files assembly.xml
(can be customized name):
<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptors>
<descriptor>src/main/resources/assembly.xml</descriptor>
</descriptors>
<archive>
<manifest>
<mainClass>com.heibaiying.wordcount.ClusterWordCountApp</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
复制代码
assembly.xml
Since expanding jar-with-dependencies.xml
the use of <excludes>
labels to exclude Storm jars, including the following:
<assembly xmlns="http://maven.apache.org/ASSEMBLY/2.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/ASSEMBLY/2.0.0
http://maven.apache.org/xsd/assembly-2.0.0.xsd">
<id>jar-with-dependencies</id>
<!--指明打包方式-->
<formats>
<format>jar</format>
</formats>
<includeBaseDirectory>false</includeBaseDirectory>
<dependencySets>
<dependencySet>
<outputDirectory>/</outputDirectory>
<useProjectArtifact>true</useProjectArtifact>
<unpack>true</unpack>
<scope>runtime</scope>
<!--排除 storm 环境中已经提供的 storm-core-->
<excludes>
<exclude>org.apache.storm:storm-core</exclude>
</excludes>
</dependencySet>
</dependencySets>
</assembly>
复制代码
In the configuration file can not exclude dependence can also exclude specified file, more configuration rules can refer to the official document: Descriptor the Format
2. Package command
When using maven-assembly-plugin packaged command follows:
# mvn assembly:assembly
复制代码
Simultaneously generating two JAR package after packing, where suffix jar-with-dependencies
is dependent JAR package containing a third party, the suffix is assembly.xml
in <id>
the specified label may be custom modifications. The JAR submit to a clustered environment can be used directly.
Four, maven-shade-plugin plugin
4.1 official documentation
The third way is to use the maven-shade-plugin, Now that you have maven-assembly-plugin, why do we need maven-shade-plugin, which is in official documents also have described, from the official to explain the integration of HDFS chapters Storm HDFS Integration , reads as follows:
When packaging your topology, it's important that you use the maven-shade-plugin as opposed to the maven-assembly-plugin.
The shade plugin provides facilities for merging JAR manifest entries, which the hadoop client leverages for URL scheme resolution.
If you experience errors such as the following:
java.lang.RuntimeException: Error preparing HdfsBolt: No FileSystem for scheme: hdfs 复制代码
it's an indication that your topology jar file isn't packaged properly.
If you are using maven to create your topology jar, you should use the following
maven-shade-plugin
configuration to create your topology jar。
Here's the first sentence said relatively clear, integrated HDFS, you must use the maven-shade-plugin instead maven-assembly-plugin, otherwise it will throw a RuntimeException exception.
When using maven-shade-plugin package has many advantages, such as your project depends on many of the JAR package, is dependent JAR JAR will rely on other packages, so that when the project relies to a different version of JAR, and JAR when a resource file that has the same name, shade plug-in attempts when all resource files packaged together, rather than as execution and assembly operations covered.
4.2 Configuration
Using maven-shade-plugin
packaged, when a configuration example as follows:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<configuration>
<createDependencyReducedPom>true</createDependencyReducedPom>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.sf</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.dsa</exclude>
<exclude>META-INF/*.RSA</exclude>
<exclude>META-INF/*.rsa</exclude>
<exclude>META-INF/*.EC</exclude>
<exclude>META-INF/*.ec</exclude>
<exclude>META-INF/MSFTSIG.SF</exclude>
<exclude>META-INF/MSFTSIG.RSA</exclude>
</excludes>
</filter>
</filters>
<artifactSet>
<excludes>
<exclude>org.apache.storm:storm-core</exclude>
</excludes>
</artifactSet>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
复制代码
The above configuration example comes from the Storm Github, here to do some explanation:
In the above configuration, the excluded part of the file, this is because some JAR packet generation, uses a signature file generated jarsigner (Completion check), is divided into two files are stored in the META-INF directory:
- a signature file, with a .SF extension;
- a signature block file, with a .DSA, .RSA, or .EC extension;
If there are repeated references to certain packages, which may result in the pack when Invalid signature file digest for Manifest main attributes
abnormal, so exclude these files in the configuration.
4.3 Package command
Use maven-shade-plugin when packaged, packaging and general command like:
# mvn package
复制代码
After the package generates two JAR package, submitted to the server when a cluster using 非 original
the beginning of the JAR.
V. Conclusion
By packaging the above three ways detailed description given here conclusion: recommended maven-shade-plugin packaged plug , because of its most versatile, simple operation, and Storm Github all examples are based on the embodiment packaged.
Sixth, Packaging Considerations
No matter which way packaging, must be excluded storm jars cluster environment has to offer. Here is typical storm-core, which is already present in the lib installation directory.
If you do not rule out the storm-core, usually throws the following exception:
Caused by: java.lang.RuntimeException: java.io.IOException: Found multiple defaults.yaml resources.
You're probably bundling the Storm jars with your topology jar.
[jar:file:/usr/app/apache-storm-1.2.2/lib/storm-core-1.2.2.jar!/defaults.yaml,
jar:file:/usr/appjar/storm-hdfs-integration-1.0.jar!/defaults.yaml]
at org.apache.storm.utils.Utils.findAndReadConfigFile(Utils.java:384)
at org.apache.storm.utils.Utils.readDefaultConfig(Utils.java:428)
at org.apache.storm.utils.Utils.readStormConfig(Utils.java:464)
at org.apache.storm.utils.Utils.<clinit>(Utils.java:178)
... 39 more
复制代码
Reference material
For more on configuration maven-shade-plugin can refer to: maven-shade-plugin Getting Started
More big data series can be found GitHub open source project : Big Data Getting Started