Common real-time stream processing methods for big data (Kafka+SparkStream or ETL+Kudu)

I just finished a project in the past two days. I have a habit of summarizing and reviewing the project after finishing the project.

It happened that there was nothing to do in the past two days, so I made a demo according to the project, which can be regarded as an instantiation of the project.

1. Project process

The core of the project: show the conventional processing method of real-time data stream

Overall process:
flow chart

After planning the project process, we can split it up and realize it one by one.


Second, the simulation data is sent to UDP

UDP is a connectionless transport layer protocol in the reference model. It is mainly used in the transmission that does not require the sequential arrival of packets. The inspection and sorting of the packet transmission sequence is completed by the application layer, providing simple and unreliable transaction-oriented information transmission services.

SCADA (Supervisory Control And Data Acquisition) system, that is, data acquisition and monitoring control system. The SCADA system is a computer-based DCS and power automation monitoring system; it has a wide range of applications, and can be used in many fields such as data acquisition, monitoring control and process control in electric power, metallurgy, petroleum, chemical industry, gas, railway and other fields.

UDP has a certain application in the Scada system, so it can also be used as a small part of the real-time data flow (such as sending a physical device to a designated port, and the underlying storage listens to the port to obtain data).

Although it is to create data, but also to create a model ~ designed
5 columns: time, date, id, name, value
Among them, time is accurate to the second, date is the date (yyyy-mm-dd), id is Incremental int type, value is generated by random value.

Thinking about this a little bit, it is not difficult to find that we can be divided into three classes or methods to reduce complexity and improve readability.
They are:

  • format date class
  • Get random value class
  • Send to UDP class

1. Format date class

The main purpose is to obtain the current timestamp, and then convert it to second-level data and date-level data. The main method is printed and can be omitted.

package com.example.utils;
import java.text.SimpleDateFormat;
public class TimeStampFormat {
    
    
    // 获取时间戳
    private Long timestamp = System.currentTimeMillis();
    // 时间戳转时间
    public String getTime() {
    
    
        SimpleDateFormat formatTime = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        return formatTime.format(timestamp);
    }
    // 时间转日期
    public String getDate() {
    
    
        SimpleDateFormat formatTime = new SimpleDateFormat("yyyy-MM-dd");
        return formatTime.format(timestamp);
    }
    public static void main(String[] args) {
    
    
        String time = new TimeStampFormat().getTime();
        String date = new TimeStampFormat().getDate();
        System.out.println(time);
        System.out.println(date);
    }
}

2. Get random value class

The getInt method is used to get a random name when creating a name later. There is an array of first and last name.

package com.example.utils;

public class GetRandom {
    
    
    // 获取一个随机数
    private double random = Math.random();

    // 随机数转整数,用于当索引下标
    public int getInt() {
    
    
        return (int)(random * 10);
    }

    // 随机数转固定位小数(6位)
    public Double getDouble() {
    
    
        return Double.valueOf(String.format("%.6f",random * 100));
    }

    public static void main(String[] args) {
    
    
        System.out.println(new GetRandom().getInt());
        System.out.println(new GetRandom().getDouble());
        System.out.println(new GetRandom().random);
    }
}

3. Send to UDP class

In the beginning, the sending method was directly written into the main, but it is still drawn into a method, which is more intuitive.

Tips: This send method is basically two new, one send and one close. The occupation and consumption of resources are relatively large. In fact, you can change the way to reduce the cost of creating objects in Java.

package com.example.service;

import com.example.utils.GetRandom;
import com.example.utils.TimeStampFormat;

import java.io.IOException;
import java.net.*;
import java.util.concurrent.TimeUnit;


public class SendToUDP {
    
    
    // IP
    private static String IP = "10.168.1.xx";
//    private static String IP = "127.0.0.1";

    // port
    private static String PORT = "3927";

    private static void send(byte[] sendValue) throws SocketException, UnknownHostException {
    
    
        // 创建socket对象
        DatagramSocket ds = new DatagramSocket();

        // 打包数据
        DatagramPacket datagramPacket = new DatagramPacket(sendValue, sendValue.length, InetAddress.getByName(IP), Integer.parseInt(PORT));

        // send
        try {
    
    
            ds.send(datagramPacket);
        } catch (IOException e) {
    
    
            e.printStackTrace();
        } finally {
    
    
            ds.close();
        }
    }

    public static void main(String[] args) throws InterruptedException, SocketException, UnknownHostException {
    
    
        // 创建数据  1 时间戳;2 时间;3 ID;4 Name ;5 Values;
        int i = 0;
        String[] surNameList = "李、王、张、刘、陈、杨、赵、黄、周、吴".split("、");
        String[] nameList = "梦琪、忆柳、之桃、慕青、问兰、尔岚、元香、初夏、沛菡、傲珊".split("、");

        while (true) {
    
     // 一直发送数据
            TimeStampFormat ts = new TimeStampFormat();
            GetRandom rd = new GetRandom();

            // 1 time
            String time = ts.getTime();

            // 2 date
            String date = ts.getDate();

            // 3 id
            i ++ ;

            // 4 Name  name = surNameList[i] + nameList[index]
            String name = surNameList[rd.getInt()] + nameList[rd.getInt()];

            // 5 values
            Double doubleValues = rd.getDouble();

            // 拼接数据
            byte[] sendValue = String.format("%s,%s,%s,%s,%s", time, date, i, name, doubleValues).getBytes();

            System.out.println(String.format("%s,%s,%s,%s,%s", time, date, i, name, doubleValues));

            send(sendValue);
            // 休眠1纳秒再发送
            TimeUnit.NANOSECONDS.sleep(1);
        }
    }
}

running result:
send to UDP

3. Parse UDP and send it to Kafka

This part is relatively simple, directly configure Kafka Producer, then parse the received UDP packets into a comma-separated format, and send them to Kafka.

1. Kafka helper class

package com.example.utils;

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;


import java.util.Properties;

public class KafkaUtils {
    
    
    public Producer getProducer() {
    
    
        // 实例化配置类
        Properties props = new Properties();
        //集群地址,多个服务器用","分隔
        props.put("bootstrap.servers", "10.168.1.xx:9092");
        //key、value的序列化,此处以字符串为例,使用kafka已有的序列化类
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        //props.put("partitioner.class", "com.kafka.demo.Partitioner");//分区操作,此处未写
        props.put("request.required.acks", "1");

        Producer<String, String> kafkaProducer = new KafkaProducer<String, String>(props);

        return kafkaProducer;
    }

    public void closeRes(Producer kafkaProducer) {
    
    
        if (kafkaProducer != null) {
    
    
            try {
    
    
                kafkaProducer.close();
            } catch (Exception e) {
    
    
                e.printStackTrace();
            }
        }
    }
}

2. Parse UDP and send to Kafka

package com.example.dao;

import com.example.utils.KafkaUtils;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.log4j.Level;
import org.apache.log4j.Logger;

import java.io.IOException;
import java.net.DatagramPacket;
import java.net.DatagramSocket;

public class ReceiveUDPSendToKafka {
    
    
    public static void main(String[] args) throws IOException {
    
    
        Logger.getLogger("org").setLevel(Level.INFO);
        // 定义一个接收端
        DatagramSocket ds = new DatagramSocket(3927);

        // 获取Kafka配置
        Producer producer = new KafkaUtils().getProducer();

        while (true) {
    
    
            // 接收数据
            byte[] bytes = new byte[1024];
            // dp
            DatagramPacket dp = new DatagramPacket(bytes, bytes.length);
            ds.receive(dp);

            // 解析
            byte[] data = dp.getData();
            int length = dp.getLength();

            //输出
            String outData = new String(data, 0, length);
//            System.out.println(outData);

            // key
            String key = outData.split(",")[2];

            // ProducerRecord 这里需要三个参数,第一个:topic的名称,第二个参数:表示消息的key,第三个参数:消息具体内容
            try {
    
    
                producer.send(new ProducerRecord<String, String>("demoTopic", key, outData));
                System.out.println("发送成功:" + outData);
            } catch (Exception e) {
    
    
                try {
    
    
                    new KafkaUtils().closeRes(producer);
                } catch (Exception e1) {
    
    
                    e.printStackTrace();
                }
            }
        }
    }
}

Tips: This monitoring... seems to be unable to specify IP? So if you want to run it, you need to change the address sent to UDP to the local machine, or package the program that parses UDP to run on the IP server that sends UDP.

Running effect: I directly simulate the consumer view on Kafka:
kafka

4. SDC parses Kafka and writes to Kudu

StreamSets Data Collector (SDC) is currently the most advanced visual data collection configuration tool, which is very suitable for real-time data collection, taking into account batch data collection and data ETL without landing. If you are using previous-generation data collection tools such as Flume, Logstash, Sqoop, and Canal, it is recommended that you use SDC as an upgrade replacement.

Apache Kudu is an open source distributed data storage engine that makes it easy to perform fast analytics on rapidly changing data. Consider both OLAD and OLTP.

For both kinds of data, I would consider doing it with ETL.

  • There are many data sources. For example, it is necessary to migrate the data of all databases in MySQL to the big data platform
  • The data has been processed into structured data, the real-time requirement is at the second level, and the server resources are abundant.

In the first case, there is too much data. If you write code, there will be many versions, or you need a script to run, so I consider using ETL
; One to master, but consumes a lot of resources.

Tips: Planting Grass is a Chinese website, which is very easy to use and very comprehensive.
StreamSets Chinese site: link: http://streamsets.vip/

This part mainly uses ETL to process data, visual ETL, in addition to SDC, and NIFI, non-visual ETL can consider Sqoop and Flume.

1. Data source

input source

2. Processor

Since the data in Kafka is separated by commas, use commas as the separator directly, and then bind the column name.
processor

3. Output source

First create a kudu table:

CREATE TABLE kafka_to_kudu(
id int,
point_date STRING,
point_time STRING,
name STRING,
value DOUBLE,
PRIMARY KEY (id,point_date))
PARTITION BY HASH (id) PARTITIONS 10,
RANGE (point_date) (
    PARTITION "2021-07-19" <= VALUES < "2021-07-19\000",
    PARTITION "2021-07-20" <= VALUES < "2021-07-20\000",
    PARTITION "2021-07-21" <= VALUES < "2021-07-21\000",
    PARTITION "2021-07-22" <= VALUES < "2021-07-22\000",
    PARTITION "2021-07-23" <= VALUES < "2021-07-23\000",
    PARTITION "2021-07-24" <= VALUES < "2021-07-24\000",)
STORED AS KUDU
TBLPROPERTIES ('kudu.master_addresses'='10.168.1.12:7051');

This is a kudu table created using impala. Impala + Kudu consumes a lot of memory (I installed CDH directly). If conditions do not permit, it is recommended to write directly to hive.

Output source configuration:
output source

Running effect (there is an error report because IDEA has run it just now, sent a message to UDP, and then the program arranged on the server is analyzed and sent to kafka...and then written to kudu by SDC analysis, but the id in kudu already exists, so an error is reported...):
running result

4. Automatically create partitions

The ranger partition only reaches the 24th, and data cannot be inserted beyond the 24th. You can create a new script and execute it regularly.
Script: Create new partitions after 3 days every day

#!bin/bash
add=$(date -d +3day "+%Y-%m-%d")
nohup impala-shell -q "alter table default.kafka_to_kudu  add range partition '${add}' <= VALUES < '${add}\000'" >> /dev/null &

Timing task: Execute once a day

0 1 * * * sh /root/kudutool/kuduParitition.sh &

5. Spark Streaming parses Kafka and writes to Kudu

This part is the core content. The main process is to create a StreamingContext, then receive Kafka, convert the received data into DF, and then use the native API to save it.

package com.example.dao

import java.lang

import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.types.StructField
import org.apache.spark.sql.types.DataTypes
import org.apache.log4j.{
    
    Level, Logger}
import org.apache.spark.SparkConf
import org.apache.spark.sql.{
    
    Row, SparkSession}
import org.apache.spark.streaming.dstream.InputDStream
import org.apache.spark.streaming.kafka010.{
    
    ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{
    
    Seconds, StreamingContext}

object Kafka_To_Kudu {
    
    
  Logger.getLogger("org").setLevel(Level.WARN)

  def getSparkSess(): StreamingContext = {
    
    
    val ssc = new StreamingContext(new SparkConf().setMaster("local[*]").setAppName("Kafka_To_Kudu")
      // 不加这个set,会报错:对象不可序列化
      .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
      , Seconds(1))

    // checkpoint
    ssc.checkpoint("hdfs://10.168.1.xx/data/spark/checkpoint/kafka-to-kudu")
    // return
    ssc
  }

  def getKafkaConf(): Map[String, Object] = {
    
    
    val kafkaConfig = Map[String, Object](
      "bootstrap.servers" -> "10.168.1.13:9092"
      , "key.deserializer" -> classOf[StringDeserializer] // 指定序列化的方式
      , "value.deserializer" -> classOf[StringDeserializer] // 指定反序列化方式
      , "group.id" -> "group01"
      // 指定消费位置
      , "auto.offset.reset" -> "latest"
      // 提交方式  true :自动提交
      , "enable.auto.commit" -> (true: lang.Boolean)
    )
    kafkaConfig
  }

  def main(args: Array[String]): Unit = {
    
    
    val topic = Array("demoTopic")
    val ssc = getSparkSess()

    // 配置消费
    val streams: InputDStream[ConsumerRecord[String, String]] = KafkaUtils.createDirectStream(
      ssc, LocationStrategies.PreferConsistent,
      ConsumerStrategies.Subscribe(topic, getKafkaConf())
    )

    // insert kudu
    // 先转为DF,不然不能保存
    streams.foreachRDD {
    
     rdd =>
      // get ss
      val ss = SparkSession.builder().config(rdd.sparkContext.getConf).getOrCreate()
      // 处理
      val value = rdd.map(_.value().split(",")).map(x => // 动态编码
        Row(x(2).trim.toInt, x(1), x(0), x(3), x(4).trim.toDouble))
      //        .toDF("time", "date", "id", "name", "value")  // Bean + 反射,略

      val schema = StructType(List(
        StructField("id", DataTypes.IntegerType, false),
        StructField("point_date", DataTypes.StringType, false),
        StructField("point_time", DataTypes.StringType, false),
        StructField("name", DataTypes.StringType, false),
        StructField("value", DataTypes.DoubleType, false)
      ))

      // 绑定
      val frame = ss.createDataFrame(value, schema)

      // frame.printSchema()
      // frame.show()

      // 保存
      try {
    
    
        frame.write.options(Map("kudu.master" -> "10.168.1.xx"
          , "kudu.table" -> "impala::default.spark_to_kudu"))
          .mode("append")
          .format("org.apache.kudu.spark.kudu")
          .save()
        println("保存成功" + frame)
      } catch {
    
    
        case e: Exception => {
    
    
          try {
    
    
            ss.stop()
          } catch {
    
    
            case e1: Exception => {
    
    
              e1.printStackTrace()
            }
          }
          e.printStackTrace()
        }
      }
    }

    // start
    ssc.start()
    ssc.awaitTermination()
  }
}

maven:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>ScalaSparkStremingConsumerKafka</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <scala.version>2.11.12</scala.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
            <version>2.4.4</version>
        </dependency>

        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>2.1.0-cdh6.2.1</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.4.0-cdh6.3.1</version>
        </dependency>

        <dependency>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>1.2.17-cloudera1</version>
        </dependency>
    </dependencies>

    <build>
        <sourceDirectory>src/main/scala</sourceDirectory>
        <testSourceDirectory>src/test/scala</testSourceDirectory>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.2</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <encoding>UTF-8</encoding>
                </configuration>
            </plugin>

            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.4.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                        <configuration>
                            <args>
                                <arg>-dependencyfile</arg>
                                <arg>${project.build.directory}/.scala_dependencies</arg>
                            </args>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.2.4</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <transformers>
                                <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <mainClass>com.example.dao.Kafka_To_Kudu</mainClass>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

</project>

Create a new spark_to_kudu in kudu according to the above kudu table creation statement

This checkpoint is used to record the offset consumed by Kafka, it needs to be created, and the permission is changed to 777

running result:
running result

count

Six, using StructuredStreaming processing

This was added on 2021-07-27.
code:

package com.example.dao

import org.apache.log4j.{
    
    Level, Logger}
import org.apache.spark.sql.streaming.Trigger
import org.apache.spark.sql.types.{
    
    DataTypes, StructField, StructType}
import org.apache.spark.sql.{
    
    DataFrame, Row, SaveMode, SparkSession}

object Kafka_To_Kudu_Structured {
    
    
  Logger.getLogger("org").setLevel(Level.ERROR)

  def getSparkSess(): SparkSession = {
    
    
    val ss = SparkSession.builder().master("local[*]")
      .appName("Kafka_To_Kudu_Structured").getOrCreate()
    ss
  }

  def loadKafkaSession(ss: SparkSession): DataFrame = {
    
    
    val df = ss.readStream.format("kafka")
      .option("kafka.bootstrap.servers", "10.168.1.xx:9092") // boker server
      .option("subscribe", "demoTopic") // topic
      .option("startingOffsets", "latest") // 从最新的地方开始消费
      .load()
    df
  }

  def main(args: Array[String]): Unit = {
    
    
    val ss = getSparkSess()
    val df = loadKafkaSession(ss)

    // 隐式转换
    import ss.implicits._

    // 输出测试
    /*    df.selectExpr("CAST(value AS STRING)").as[String]
          .writeStream.outputMode("append").format("console")
          .trigger(Trigger.ProcessingTime(0L))
          .option("checkpointLocation","hdfs://10.168.1.12:8020/data/spark_check_point")
          .option("truncate",false)
          .start()*/


    /**
     * 处理 + 保存到kudu
     * kudu表:structured_to_kudu
     * 因为structured不支持kudu,所以先输出到memory,然后再保存到Kudu
     * Tips1:如果数据过大,会造成内存溢出
     * Tips2:如果对数据没有处理(筛选、聚合),建议直接用SparkStreaming即可
     * Tips3:此处采用的是foreachBatch方式,批量保存到kudu...
     **/
    df.selectExpr("CAST(value AS STRING)").as[String]
      .map(line => {
    
    
        val arr: Array[String] = line.split(",")
        // 输出查看
        // println(arr(2).toInt, arr(1), arr(0), arr(3), arr(4).toDouble)
        (arr(2).toInt, arr(1), arr(0), arr(3), arr(4).toDouble)
      }).toDF("id", "point_date", "point_time", "name", "value")
      .writeStream.outputMode("append").foreachBatch((df, batchId) => {
    
     // 当前分区id, 当前批次id
      if (df.count() != 0) {
    
    
        df.cache() // 加载到内存,速度更快
        df.write.mode(SaveMode.Append).format("org.apache.kudu.spark.kudu")
          //设置master(ip地址)
          .option("kudu.master", "10.168.1.xx")
          //设置kudu表名
          .option("kudu.table", "impala::default.structured_to_kudu")
          //保存
          .save()
        println("保存成功!" + df)
      }
    })
      .trigger(Trigger.ProcessingTime(0L))
      .start()


    // run
    ss.streams.awaitAnyTermination()
  }
}

maven:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>ScalaSparkStremingConsumerKafka</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <scala.version>2.11.12</scala.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
            <version>2.4.4</version>
        </dependency>

        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>2.1.0-cdh6.2.1</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.4.0-cdh6.3.1</version>
        </dependency>

        <dependency>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>1.2.17-cloudera1</version>
        </dependency>
    </dependencies>

    <build>
        <sourceDirectory>src/main/scala</sourceDirectory>
        <testSourceDirectory>src/test/scala</testSourceDirectory>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.2</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <encoding>UTF-8</encoding>
                </configuration>
            </plugin>

            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.4.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                        <configuration>
                            <args>
                                <arg>-dependencyfile</arg>
                                <arg>${project.build.directory}/.scala_dependencies</arg>
                            </args>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.2.4</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <transformers>
                                <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <mainClass>com.example.dao.Kafka_To_Kudu</mainClass>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

</project>



At the end... I'm going to pack it up and run it on the server... but I keep getting an error:

insert image description here

Guess you like

Origin blog.csdn.net/qq_44491709/article/details/119010253