Storm introductory programming --Storm

Telecommunications call records, for example

Mobile calls and their duration as the input to the Apache Storm, Storm and the total number of calls and call processing packets between the caller and the receiver of the same.

 Programming ideas:

In the storm, the process of data abstracted into a topology, the topology of the components comprising mainly spout, bolt, and a tuple in a data stream transmitted between the components. The topology data stream flow again, once the data is processed.

1, create a class Spout

This part is to create the source data stream.

Creating a class that implements the interface IRichSpout achieve the appropriate method. Several of methods meanings:

  • Open - provide an execution environment for Spout. The actuator will move this method to initialize the nozzle. General write some logic first run to be processed
  • nextTuple - emitted by the data generation collector. The core, for generating a data stream
  • use Close - This method is called when the spout will be shut down.
  • declareOutputFields - output mode declaration tuples. That is, a statement from spout out streams of data format
  • ACK - confirmation process particular tuple.
  • Fail - Specifies not processed and reprocessed without particular tuple.
open(Map conf, TopologyContext context, SpoutOutputCollector collector)
  • conf - provided for this purpose spout storm configuration.
  • context - the location of the spout to provide topology, complete information which the Job ID, the input and output of information.
  • collector - so that we can send out a tuple processed by bolts.
nextTuple()

nextTuple () from the ACK () and fail () calls in the same manner as a regular cycle. It must release threads of control, when there is no work to do, so that other methods have the opportunity to be called. Therefore, nextTuple the first line of the inspection process has been completed. If so, it should sleep at least one millisecond in order to reduce the processor load before returning.

declareOutputFields(OutputFieldsDeclarer declarer)

declarer - It is used to declare an output stream id, like output field, the method for specifying the output mode tuples.

ack(Object msgId)

This method confirmed that the specific tuples have been processed.

fail(Object o)

This method notifies specific tuple has not been fully processed. Storm will re-process specific tuple

package com.jing.calllogdemo;

import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichSpout;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Random;

/*
spout类,负责产生数据流
 */
public class CallLogSpout implements{IRichSpout
     // Spout output collector 
    Private SpoutOutputCollector Collector;
     // completion 
    Private  Boolean Completed = to false ;
     // context object 
    Private TopologyContext context;
     // random generator 
    Private the Random randomGenerator = new new the Random ();
     // index 
    Private Integer IDX 0 = ; 

    @Override 
    public  void Open (the Map the Map, TopologyContext topologyContext, SpoutOutputCollector spoutOutputCollector) {
         // first run to do 
        the this the .context = topologyContext;
        this.collector = spoutOutputCollector;

    }

    @Override
    public void close() {

    }

    @Override
    public void activate() {

    }

    @Override
    public void deactivate() {

    }

    @Override
    public void nextTuple() {
        //产生第一条数据,

        if (this.idx <= 1000){
            List<String> mobileNumbers = new ArrayList<String>();
            mobileNumbers.add("1234123401");
            mobileNumbers.add("1234123402");
            mobileNumbers.add("1234123403");
            mobileNumbers.add("1234123404");

            Integer localIdx = 0;
            while (localIdx++ < 100 && this.idx++ <1000){
                //取出主叫
                String caller = mobileNumbers.get(randomGenerator.nextInt(4));
                //取出被叫
                String callee = mobileNumbers.get(randomGenerator.nextInt(4));
                while (caller ==the callee) {
                     // removed again called 
                    the callee = mobileNumbers.get (randomGenerator.nextInt (. 4 )); 
                } 
                // analog call duration 
                Integer = randomGenerator.nextInt DURATION (60 );
                 // output tuples 
                the this .collector.emit ( new new Values (Caller, the callee, DURATION)); 
            } 
        } 

    } 

    @Override 
    public  void ACK (O Object) { 

    } 

    @Override 
    public  void Fail (O Object) { 

    } 

    @Override 
    public  voiddeclareOutputFields (OutputFieldsDeclarer outputFieldsDeclarer) {
         // declare an output field structure defines tuples define the output field name 
        outputFieldsDeclarer.declare ( new new ( "from", "to", "DURATION" Fields )); 

    } 

    @Override 
    public the Map <String , Object> getComponentConfiguration () {
         return  null ; 
    } 
}
CallLogSpout

2. Create a class Bolt

This part is finished processing the data stream, as an input Bolt tuples, the tuples are processed to generate new tuple.

Creating a class that implements the interface IRichBolt achieve the appropriate method.

  • prepare - provides an environment to be executed for the bolt. The actuator will move this method to initialize the spout.
  • Entering a single tuple Processing - execute
  • cleanup - Called when the spout to be closed.
  • declareOutputFields - output mode declaration tuples.
prepare(Map conf, TopologyContext context, OutputCollector collector)
  • conf - bolt provided for this purpose Storm configuration.
  • context - the location of the bolt to provide topology, complete information which the Job ID, the input and output information and the like.
  • collector - so that we can send out a tuple process.
execute(Tuple tuple)

This is the core of the bolt method, where the tuples are input tuple to be processed. execute a method of processing a single tuple. Data can be accessed via the tuple Tuple class getValue method. Having to deal with immediate input tuple. Tuples can be processed and output to a single output tuple. Tuples can issue by using a process category OutputCollector.

cleanup()
declareOutputFields(OutputFieldsDeclarer declarer)

The method for specifying the output mode of tuples, the parameter for declaring declarer output stream id, like output field.

There are two bolt

Call Log creator bolt receiving the call log tuple. Call Log tuple having the calling party number, the recipient's number and call duration. This bolt simply creates a new value by the combination of the calling party number and a recipient's number. Format of the new value is "Caller ID - the receiver number" and name it as a new field "Call"

package com.jing.calllogdemo;

import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;

import java.util.Map;
/*
创建calllog日志的bolt
 */
public class CallLogCreatorBolt implements IRichBolt {
    private OutputCollector collector;

    @Override
    public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
        this.collector = outputCollector;
    }

    @Override
    public void execute(Tuple tuple) {
        //处理新的同话记录
        String from = tuple.getString(0);
        String to = tuple.getString(1);
        Integer duration = tuple.getInteger(2);
        //产生新的tuple
        String fromTO = from + "-" + to;
        collector.emit(new Values(fromTO, duration));

    }

    @Override
    public void cleanup() {

    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
        //设置输出字段的名称
        outputFieldsDeclarer.declare(new Fields("call", "duration"));
    }

    @Override
    public Map<String, Object> getComponentConfiguration() {
        return null;
    }
}
CallLogCreatorBolt

 Call Log creator bolt receiving the call log tuple. Call Log tuple having the calling party number, the recipient's number and call duration. This bolt simply creates a new value by the combination of the calling party number and a recipient's number. Format of the new value is "Caller ID - the receiver number" and name it as a new field of "call."

package com.jing.calllogdemo;

import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;

import java.util.HashMap;
import java.util.Map;
/*
通话记录计数器bolt
 */
public class CallLogCounterBolt implements IRichBolt {
    Map<String, Integer> counterMap;
    private OutputCollector collector;
    @Override
    public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
        this.counterMap = new HashMap<String, Integer>();
        this.collector = outputCollector;
    }

    @Override
    public void execute(Tuple tuple) {
        String call = tuple.getString(0);
        Integer duration = tuple.getInteger(1);
        if(!counterMap.containsKey(call)){
            counterMap.put(call, 1);
        }else {
            Integer c = counterMap.get(call) + duration;
            counterMap.put(call, c);
        }
        collector.ack(tuple);

    }

    @Override
    public void cleanup() {
        for(Map.Entry<String, Integer> entry : counterMap.entrySet()){
            System.out.println(entry.getKey() + " : " + entry.getValue());
        }

    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
        outputFieldsDeclarer.declare(new Fields("call"));

    }

    @Override
    public Map<String, Object> getComponentConfiguration() {
        return null;
    }
}
CallLogCounterBolt

 

3. Create execution entry class, build Topology

Storm is a substantially Thrift topology structure. TopologyBuilder class provides a simple and easy way to create complex topologies. TopologyBuilder class has provided Spout (setSpout) and disposed Bolt (setBolt) method. Finally, TopologyBuilder have createTopology to create a topology. Create a topology using the following code fragment -


 

package com.jing.calllogdemo;

import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.AlreadyAliveException;
import org.apache.storm.generated.AuthorizationException;
import org.apache.storm.generated.InvalidTopologyException;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;

public class App {
    public static void main(String[] args) throws InterruptedException, InvalidTopologyException, AuthorizationException, AlreadyAliveException {
        TopologyBuilder builder = new TopologyBuilder();

        //设置spout
        builder.setSpout("spout", new CallLogSpout());
        //设置creator-bolt
        builder.setBolt("creator-bolt", new CallLogCreatorBolt()).shuffleGrouping("spout");
        //设置countor-bolt
        builder.setBolt("counter-bolt", new CallLogCounterBolt()).
                fieldsGrouping("creator-bolt", new Fields("call"));

        Config config = new Config();
        config.setDebug(true);

        /*本地模式
        LocalCluster cluster = new LocalCluster();
        cluster.submitTopology("LogAnalyserStorm", config, builder.createTopology());
        Thread.sleep(10000);
        cluster.shutdown();

         */

        StormSubmitter.submitTopology("myTop", config, builder.createTopology());


    }
}
App

 

 

For development purposes, we can create a local cluster using the "LocalCluster" object, and then submit topology using "submitTopology" method "LocalCluster" category. One of the parameters "submitTopology" is an instance of "Config" category. "Config" class is used to set the configuration options before submitting topology. This configuration option will merge with cluster configuration at run-time, and sent to all tasks (spout and bolt) using the prepare method. Once submitted to the cluster topology, we will wait 10 seconds, topology cluster computing submitted, then shut down the cluster using "LocalCluster" of "shutdown" method. Complete code is as follows -
 
 
reference:

Author: raincoffee
link: https: //www.jianshu.com/p/7af9693d9ffc
Source: Jane book
Jane book copyright reserved by the authors, are reproduced in any form, please contact the author to obtain authorization and indicate the source.

 

Running on the cluster topology production environment

1) Submit modified embodiment, in the code

2) Export jar package mvn

3) run on linux topologys

&>storm jar XXX.jar  full.class.name

 

Guess you like

Origin www.cnblogs.com/Jing-Wang/p/11028749.html