Storm series (six) - three kinds of packaging Storm comparative analysis of the way the project

I. Introduction

When submitted to the server cluster running Storm Topology, first need to package the project. The main comparative analysis of various paper packaging way, and package items that need attention process will be described. There are three main ways packages:

  • The first: without any plug-ins, direct use mvn package packing;
  • The second: using maven-assembly-plugin package your plug;
  • Third: using maven-shade-plugin packaged.

The following are described in detail.

二、mvn package

2.1 mvn package limitations

POM not plug any configuration, direct use mvn packagefor packaging items, it is possible for no external dependencies of the project.

But if the project uses a third-party JAR package, there will be a problem, because the mvn packageJAR after the package does not contain is dependent on the package, if you commit at this time to run on the server, there will not find a third-party dependent abnormal .

If you want to package this way, but it uses a third-party JAR, there is no solution? The answer is yes, which is the official document of the Command Line Client has to explain the chapter, the main solution is as follows.

2.2 Solution

In use storm jarwhen submitting Topology, you can use the following third-party dependent manner specified:

  • If a third-party JAR package locally, you can use --jarsto specify;
  • If a third-party remote central warehouse in JAR packages can be used --artifactsto specify at this time if you want to exclude certain dependencies can use ^symbols. After specifying the Storm will automatically be downloaded to a central warehouse and then cached locally;
  • If a third-party JAR package in other warehouses, you also need --artifactRepositoriesto specify the repository address, name and address of the library to use ^delimited.

The following is an example of a command containing the above three conditions:

./bin/storm jar example/storm-starter/storm-starter-topologies-*.jar \
org.apache.storm.starter.RollingTopWords blobstore-remote2 remote  \
--jars "./external/storm-redis/storm-redis-1.1.0.jar,./external/storm-kafka/storm-kafka-1.1.0.jar" \
--artifacts "redis.clients:jedis:2.9.0,org.apache.kafka:kafka_2.10:0.8.2.2^org.slf4j:slf4j-log4j12" \
--artifactRepositories "jboss-repository^http://repository.jboss.com/maven2, \
HDPRepo^http://repo.hortonworks.com/content/groups/public/"
复制代码

This approach is based on a case where you can connect to the external network, if you can not connect to the server outside the network, or if you want to be able to project a directly packaged ALL IN ONEin the JAR, i.e., contains all the relevant dependent, can be used at this time are described below the two plug-ins.

Three, maven-assembly-plugin plugin

maven-assembly-plugin package method is described in official documents, from official documents: Running ON A Production's Cluster Topologies

If you're using Maven, the Maven Assembly Plugin can do the packaging for you. Just add this to your pom.xml:

<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
 <descriptorRefs>  
   <descriptorRef>jar-with-dependencies</descriptorRef>
 </descriptorRefs>
 <archive>
   <manifest>
     <mainClass>com.path.to.main.Class</mainClass>
   </manifest>
 </archive>
</configuration>
</plugin>
复制代码

Then run mvn assembly:assembly to get an appropriately packaged jar. Make sure you exclude the Storm jars since the cluster already has Storm on the classpath.

Official documentation describes the following main points:

  • Use maven-assembly-plugin can put together all the dependencies are driven into the final JAR;
  • The need to exclude Storm jars Storm cluster environment has been provided;
  • By <mainClass>designated primary inlet class label;
  • By <descriptorRef>tag specifies packaging configuration.

jar-with-dependenciesMaven is a pre-defined a basic configuration package, which is an XML file as follows:

<assembly xmlns="http://maven.apache.org/ASSEMBLY/2.0.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://maven.apache.org/ASSEMBLY/2.0.0
                              http://maven.apache.org/xsd/assembly-2.0.0.xsd">
    <id>jar-with-dependencies</id>
    <formats>
        <format>jar</format>
    </formats>
    <includeBaseDirectory>false</includeBaseDirectory>
    <dependencySets>
        <dependencySet>
            <outputDirectory>/</outputDirectory>
            <useProjectArtifact>true</useProjectArtifact>
            <unpack>true</unpack>
            <scope>runtime</scope>
        </dependencySet>
    </dependencySets>
</assembly>
复制代码

We can expand through the configuration file in order to achieve more functionality, such as the exclusion of specified JAR and so on. Use example:

1. The introduction of plug-ins

POM.xml introduced in the plug, and specifies the packing format of configuration files assembly.xml(can be customized name):

<build>
    <plugins>
        <plugin>
            <artifactId>maven-assembly-plugin</artifactId>
            <configuration>
                <descriptors>
                    <descriptor>src/main/resources/assembly.xml</descriptor>
                </descriptors>
                <archive>
                    <manifest>
                        <mainClass>com.heibaiying.wordcount.ClusterWordCountApp</mainClass>
                    </manifest>
                </archive>
            </configuration>
        </plugin>
    </plugins>
</build>
复制代码

assembly.xmlSince expanding jar-with-dependencies.xmlthe use of <excludes>labels to exclude Storm jars, including the following:

<assembly xmlns="http://maven.apache.org/ASSEMBLY/2.0.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://maven.apache.org/ASSEMBLY/2.0.0 
                              http://maven.apache.org/xsd/assembly-2.0.0.xsd">
    
    <id>jar-with-dependencies</id>

    <!--指明打包方式-->
    <formats>
        <format>jar</format>
    </formats>

    <includeBaseDirectory>false</includeBaseDirectory>
    <dependencySets>
        <dependencySet>
            <outputDirectory>/</outputDirectory>
            <useProjectArtifact>true</useProjectArtifact>
            <unpack>true</unpack>
            <scope>runtime</scope>
            <!--排除 storm 环境中已经提供的 storm-core-->
            <excludes>
                <exclude>org.apache.storm:storm-core</exclude>
            </excludes>
        </dependencySet>
    </dependencySets>
</assembly>
复制代码

In the configuration file can not exclude dependence can also exclude specified file, more configuration rules can refer to the official document: Descriptor the Format

2. Package command

When using maven-assembly-plugin packaged command follows:

# mvn assembly:assembly 
复制代码

Simultaneously generating two JAR package after packing, where suffix jar-with-dependenciesis dependent JAR package containing a third party, the suffix is assembly.xmlin <id>the specified label may be custom modifications. The JAR submit to a clustered environment can be used directly.

https://github.com/heibaiying

Four, maven-shade-plugin plugin

4.1 official documentation

The third way is to use the maven-shade-plugin, Now that you have maven-assembly-plugin, why do we need maven-shade-plugin, which is in official documents also have described, from the official to explain the integration of HDFS chapters Storm HDFS Integration , reads as follows:

When packaging your topology, it's important that you use the maven-shade-plugin as opposed to the maven-assembly-plugin.

The shade plugin provides facilities for merging JAR manifest entries, which the hadoop client leverages for URL scheme resolution.

If you experience errors such as the following:

java.lang.RuntimeException: Error preparing HdfsBolt: No FileSystem for scheme: hdfs
复制代码

it's an indication that your topology jar file isn't packaged properly.

If you are using maven to create your topology jar, you should use the following maven-shade-plugin configuration to create your topology jar。

Here's the first sentence said relatively clear, integrated HDFS, you must use the maven-shade-plugin instead maven-assembly-plugin, otherwise it will throw a RuntimeException exception.

When using maven-shade-plugin package has many advantages, such as your project depends on many of the JAR package, is dependent JAR JAR will rely on other packages, so that when the project relies to a different version of JAR, and JAR when a resource file that has the same name, shade plug-in attempts when all resource files packaged together, rather than as execution and assembly operations covered.

4.2 Configuration

Using maven-shade-pluginpackaged, when a configuration example as follows:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <configuration>
        <createDependencyReducedPom>true</createDependencyReducedPom>
        <filters>
            <filter>
                <artifact>*:*</artifact>
                <excludes>
                    <exclude>META-INF/*.SF</exclude>
                    <exclude>META-INF/*.sf</exclude>
                    <exclude>META-INF/*.DSA</exclude>
                    <exclude>META-INF/*.dsa</exclude>
                    <exclude>META-INF/*.RSA</exclude>
                    <exclude>META-INF/*.rsa</exclude>
                    <exclude>META-INF/*.EC</exclude>
                    <exclude>META-INF/*.ec</exclude>
                    <exclude>META-INF/MSFTSIG.SF</exclude>
                    <exclude>META-INF/MSFTSIG.RSA</exclude>
                </excludes>
            </filter>
        </filters>
        <artifactSet>
            <excludes>
                <exclude>org.apache.storm:storm-core</exclude>
            </excludes>
        </artifactSet>
    </configuration>
    <executions>
        <execution>
            <phase>package</phase>
            <goals>
                <goal>shade</goal>
            </goals>
            <configuration>
                <transformers>
                    <transformer
                       implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                    <transformer
                       implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                    </transformer>
                </transformers>
            </configuration>
        </execution>
    </executions>
</plugin>
复制代码

The above configuration example comes from the Storm Github, here to do some explanation:

In the above configuration, the excluded part of the file, this is because some JAR packet generation, uses a signature file generated jarsigner (Completion check), is divided into two files are stored in the META-INF directory:

  • a signature file, with a .SF extension;
  • a signature block file, with a .DSA, .RSA, or .EC extension;

If there are repeated references to certain packages, which may result in the pack when Invalid signature file digest for Manifest main attributesabnormal, so exclude these files in the configuration.

4.3 Package command

Use maven-shade-plugin when packaged, packaging and general command like:

# mvn  package
复制代码

After the package generates two JAR package, submitted to the server when a cluster using 非 originalthe beginning of the JAR.

https://github.com/heibaiying

V. Conclusion

By packaging the above three ways detailed description given here conclusion: recommended maven-shade-plugin packaged plug , because of its most versatile, simple operation, and Storm Github all examples are based on the embodiment packaged.

Sixth, Packaging Considerations

No matter which way packaging, must be excluded storm jars cluster environment has to offer. Here is typical storm-core, which is already present in the lib installation directory.

https://github.com/heibaiying

If you do not rule out the storm-core, usually throws the following exception:

Caused by: java.lang.RuntimeException: java.io.IOException: Found multiple defaults.yaml resources.   
You're probably bundling the Storm jars with your topology jar.   
[jar:file:/usr/app/apache-storm-1.2.2/lib/storm-core-1.2.2.jar!/defaults.yaml,   
jar:file:/usr/appjar/storm-hdfs-integration-1.0.jar!/defaults.yaml]
        at org.apache.storm.utils.Utils.findAndReadConfigFile(Utils.java:384)
        at org.apache.storm.utils.Utils.readDefaultConfig(Utils.java:428)
        at org.apache.storm.utils.Utils.readStormConfig(Utils.java:464)
        at org.apache.storm.utils.Utils.<clinit>(Utils.java:178)
        ... 39 more
复制代码

https://github.com/heibaiying

Reference material

For more on configuration maven-shade-plugin can refer to: maven-shade-plugin Getting Started

More big data series can be found GitHub open source project : Big Data Getting Started

Guess you like

Origin juejin.im/post/5d8593a1f265da03be491575