Flume+Kafka+Storm+Redis stream computing implementation wordcount example

Flume+Kafka+Storm+Redis stream computing implementation wordcount example

I. Introduction

This article uses Flume, kafka, storm, and redis to implement a small case of wordcount for stream computing.

The cluster machines are server01, server02, server03.

Execute the python script that generates the data, the Flume collection program, the zookeeper cluster, and the kafka cluster on the 3 machines at the same time

Start the redis-server service on server01.

Second, the python script to generate data

python script file: produce_log2.py

Implemented in python3, the logic is: randomly obtain a piece of data from the list set every second and write it to a /home/hadoop/log/access.logfile.

/home/hadoop/logCreate directories on server01, server02, server03 machines

mkdir -p /home/hadoop/log

Below is the produce_log2.pyscript code

import os
import time
import sched
import random


def create_log():
    file = open("/home/hadoop/log/access.log", mode="a+", encoding='utf-8')
    file.write(random.sample(list, 1)[0])
    file.flush()


if __name__ == '__main__':
    """
    python3.0 定时执行任务
    """
    list = ['天魁星 呼保义 宋江\n', '天罡星 玉麒麟 卢俊义\n', '天机星 智多星 吴用\n', '天闲星 入云龙 公孙胜\n', '天勇星 大刀 关胜\n', '天雄星 豹子头 林冲\n',
            '天猛星 霹雳火 秦明\n', '天威星 双鞭 呼延灼\n', '天英星 小李广 花荣\n', '天贵星 小旋风 柴进\n', '天富星 扑天雕 李应\n', '天満星 美髯公 朱仝\n',
            '天孤星 花和尚 鲁智深\n', '天伤星 行者 武松\n', '天立星 双枪将 董平\n', '天捷星 没羽箭 张清\n', '天暗星 青面兽 杨志\n', '天佑星 金枪手 徐宁\n',
            '天空星 急先锋 索超\n', '天速星 神行太保 戴宗\n', '天异星 赤发鬼 刘唐\n', '天杀星 黒旋风 李逵\n', '天微星 九纹龙 史进\n', '天究星 没遮拦 穆弘\n',
            '天退星 插翅虎 雷横\n', '天寿星 混江龙 李俊\n', '天剑星 立地太岁 阮小二\n', '天平星 船火儿 张横\n', '天罪星 短命二郎 阮小五\n', '天损星 浪里白跳 张顺\n',
            '天败星 活阎罗 阮小七\n', '天牢星 病关索 杨雄\n', '天慧星 拼命三郎 石秀\n', '天暴星 两头蛇 解珍\n', '天哭星 双尾蝎 解宝\n', '天巧星 浪子 燕青\n',
            '地魁星 神机军师 朱武\n', '地煞星 镇三山 黄信\n', '地勇星 病尉迟 孙立\n', '地杰星 丑郡马 宣赞\n', '地雄星 井木犴 郝思文\n', '地威星 百胜将 韩滔\n',
            '地英星 天目将 彭玘\n', '地奇星 圣水将 单廷圭\n', '地猛星 神火将 魏定国\n', '地文星 圣手书生 萧让\n', '地正星 铁面孔目 裴宣\n', '地阔星 摩云金翅 欧鹏\n',
            '地阖星 火眼狻猊 邓飞\n', '地强星 锦毛虎 燕顺\n', '地暗星 锦豹子 杨林\n', '地轴星 轰天雷 凌振\n', '地会星 神算子 蒋敬\n', '地佐星 小温侯 吕方\n',
            '地佑星 赛仁贵 郭盛\n', '地灵星 神医 安道全\n', '地兽星 紫髯伯 皇甫端\n', '地微星 矮脚虎 王英\n', '地慧星 一丈青 扈三娘\n', '地暴星 丧门神 鲍旭\n',
            '地然星 混世魔王 樊瑞\n', '地猖星 毛头星 孔明\n', '地狂星 独火星 孔亮\n', '地飞星 八臂哪吒 项充\n', '地走星 飞天大圣 李衮\n', '地巧星 玉臂匠 金大坚\n',
            '地明星 铁笛仙 马麟\n', '地进星 出洞蛟 童威\n', '地退星 翻江蜃 童猛\n', '地满星 玉幡竿 孟康\n', '地遂星 通臂猿 侯健\n', '地周星 跳涧虎 陈达\n',
            '地隐星 白花蛇 杨春\n', '地异星 白面郎君 郑天寿\n', '地理星 九尾亀 陶宗旺\n', '地俊星 铁扇子 宋清\n', '地乐星 铁叫子 乐和\n', '地捷星 花项虎 龚旺\n',
            '地速星 中箭虎 丁得孙\n', '地镇星 小遮拦 穆春\n', '地羁星 操刀鬼 曹正\n', '地魔星 云里金刚 宋万\n', '地妖星 摸着天 杜迁\n', '地幽星 病大虫 薛永\n',
            '地僻星 打虎将 李忠\n', '地空星 小霸王 周通\n', '地孤星 金钱豹子 汤隆\n', '地全星 鬼脸儿 杜兴\n', '地短星 出林龙 邹渊\n', '地角星 独角龙 邹润\n',
            '地囚星 旱地忽律 朱贵\n', '地蔵星 笑面虎 朱富\n', '地伏星 金眼彪 施恩\n', '地平星 鉄臂膊 蔡福\n', '地损星 一枝花 蔡庆\n', '地奴星 催命判官 李立\n',
            '地察星 青眼虎 李云\n', '地悪星 没面目 焦挺\n', '地丑星 石将军 石勇\n', '地数星 小尉遅 孙新\n', '地阴星 母大虫 顾大嫂\n', '地刑星 菜园子 张青\n',
            '地壮星 母夜叉 孙二娘\n', '地劣星 活闪婆 王定六\n', '地健星 険道神 郁保四\n', '地耗星 白日鼠 白胜\n', '地贼星 鼓上蚤 时迁\n', '地狗星 金毛犬 段景住\n']

    schedule = sched.scheduler(time.time, time.sleep)
    while True:
        schedule.enter(1, 0, create_log)
        schedule.run()

Code can be tested individually

3. Flume configuration

enter the /hadoop/flume/confdirectory

cd /hadoop/flume/conf

Create a flume-sink-kafka.conffile and edit it as follows:

a1.sources = s1
a1.channels = c1
a1.sinks = k1

a1.sources.s1.type=exec
a1.sources.s1.command=tail -F /home/hadoop/log/access.log

a1.channels.c1.type=memory
a1.channels.c1.capacity=10000
a1.channels.c1.transactionCapacity=100

#设置Kafka接收器
a1.sinks.k1.type= org.apache.flume.sink.kafka.KafkaSink
#设置Kafka的broker地址和端口号
a1.sinks.k1.brokerList=server01:9092,server02:9092,server03:9092
#设置Kafka的Topic
a1.sinks.k1.topic=fksrtest
#设置序列化方式
a1.sinks.k1.serializer.class=kafka.serializer.StringEncoder
a1.sinks.k1.requiredAcks = 1


a1.sources.s1.channels=c1
a1.sinks.k1.channel=c1  

4. Kafka-related cluster startup and topic creation

4.1 Start the zookeeper cluster

Start zookeeper on each server01, server02, server03 machine

zkServer.sh start

For the construction of the zookeeper cluster, refer to: Zookeeper cluster environment construction

4.2 Start the kafka cluster

For Kafka cluster construction, please refer to: Kafka cluster construction and producer-consumer case

Start the kafka cluster on each machine

Enter/hadoop/kafka

cd /hadoop/kafka

Specify the startup command

./bin/kafka-server-start.sh -daemon ./config/server.properties

4.3 By jpsviewing the process after startup

Each machine contains the following processes, the startup is successful

Kafka
QuorumPeerMain

As shown in the figure:

image

4.4 Create a topic for kafka

# 主题topic为fksrtest
kafka-topics.sh --create --zookeeper server01:2181,server02:2181,server03:2181 --replication-factor 3 --partitions 3 --topic fksrtest

image

5. Production test

Test the production data side to ensure that the previous operations are OK.

5.1 Executed produce_log2.pyGenerate Data Script

Execute the python script on each machine

python produce_log2.py

Execute the script and view the data changes in access.log

image

By querying and comparing the data size, the access.log file is constantly appending data

5.2 Start the data collection program of flume

Execute the flume collection program on each machine

# 进入/hadoop/flume目录
cd /hadoop/flume

# 执行flume的采集程序
bin/flume-ng agent -c conf -f conf/flume-sink-kafka.conf -name a1 -Dflume.root.logger=INFO,console

5.3 On the server01 machine, execute the kafka-console-consumer.shcommand to observe a situation of kafka data

cd /hadoop/kafka

./bin/kafka-console-consumer.sh --zookeeper server01:2181,server02:2181,server03:2181 --from-beginning --topic fksrtest

The result is as follows:

image

And the end is still writing data to the command window

The above test shows that the execution of our production side is OK, and then we need to complete the execution process of the consumer side.

6. Write Storm processing logic

6.1 pom.xml configuration dependencies

<dependencies>
        <dependency>
            <groupId>org.apache.storm</groupId>
            <artifactId>storm-core</artifactId>
            <!--如果需要打成jar包,在storm集群上跑,则需要打开下面注释-->
            <!-- <scope>provided</scope>-->
            <version>0.9.5</version>
        </dependency>
        <dependency>
            <groupId>org.apache.storm</groupId>
            <artifactId>storm-kafka</artifactId>
            <version>0.9.5</version>
        </dependency>
        <dependency>
            <groupId>redis.clients</groupId>
            <artifactId>jedis</artifactId>
            <version>2.7.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka_2.9.2</artifactId>
            <version>0.8.1.1</version>
            <exclusions>
                <exclusion>
                    <groupId>org.apache.zookeeper</groupId>
                    <artifactId>zookeeper</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
    </dependencies>

6.2 Execution class MyTopology

public class MyTopology {
    public static void main(String[] args) {
        TopologyBuilder builder = new TopologyBuilder();

        BrokerHosts hosts = new ZkHosts("server01:2181,server02:2181,server03:2181");
        String topic = "fksrtest";
        String zkRoot = "/fksr";
        String id = "fksrtest_id";
        SpoutConfig spoutConf = new SpoutConfig(hosts, topic, zkRoot, id);
        builder.setSpout("spout", new KafkaSpout(spoutConf), 2);

        builder.setBolt("bolt1", new Bolt1(), 4).shuffleGrouping("spout");
        builder.setBolt("bolt2", new Bolt2(), 2).shuffleGrouping("bolt1");


        StormTopology topology = builder.createTopology();
        LocalCluster localCluster = new LocalCluster();
        localCluster.submitTopology("fksrwordcount", new Config(), topology);
    }
}

6.3 Bolt1 class

The Bolt1 class is mainly responsible for dividing the line data into words, and then sending the word and num data to Bolt2 for processing

public class Bolt1 extends BaseBasicBolt {
    public void execute(Tuple input, BasicOutputCollector collector) {
        byte[] bytes = (byte[]) input.getValueByField("bytes");
        String line = new String(bytes);
        String[] splits = line.split(" ");
        for (String word : splits) {
            collector.emit(new Values(word, 1));
        }
    }

    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word", "num"));
    }
}

6.4 Bolt2 class

Bolt2 class, store word and num data in a map set, and then store the map set in the redis cache

public class Bolt2 extends BaseBasicBolt {

    private Jedis jedis;

    private HashMap<String, String> map = new HashMap<String, String>();

    @Override
    public void prepare(Map stormConf, TopologyContext context) {
        super.prepare(stormConf, context);
        jedis = new Jedis("server01", 6379);
    }

    public void execute(Tuple input, BasicOutputCollector collector) {
        String word = (String) input.getValueByField("word");
        Integer num = (Integer) input.getValueByField("num");

        String result = map.get(word);
        if (StringUtils.isNotEmpty(result)) {
            int res = Integer.parseInt(result);
            map.put(word, (num + res) + "");
        } else {
            map.put(word, num + "");
        }
        jedis.hmset("fksrtest", map);
    }

    public void declareOutputFields(OutputFieldsDeclarer declarer) {

    }
}

For the convenience of testing, the local mode of storm is directly used here.

Seven, start the redis service

Start redis-server on the server01 machine

redis-server /usr/local/redis/redis-conf

8. Consumer startup test

8.1 Program to start storm

Run the MyTopology class directly

image

8.2 Execute the java test code of Redis

public class RedisTest {
    public static void main(String[] args) {
        Jedis jedis = new Jedis("server01", 6379);
        for (String key : jedis.hkeys("fksrtest")) {
            System.out.println(key + ":" + jedis.hmget("fksrtest", key));
        }
    }
}

We can see the data in redis:

image

This completes our little case

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325805646&siteId=291194637