Use idea to remotely debug spark program and read hbase on hadoop under Win

Use idea to remotely debug spark program and read hbase on hadoop under Win

Environment:
Win7
Jdk1.8
Hadoop2.7.3 winutils.exe tool
IntelliJ IDEA 2017.3 x64
IDEA 2017.3 scala support package
Spark2.1.1
Scala2.11.4

Step 0 Configure system environment variables

0.1 Jdk1.8, Scala2.11.4 configuration is fine, not to go into details
0.2 Hadoop configuration under win: (here is 2.7.3)
Copy the hadoop2.7.3 installation path in the cluster and put it in the root directory of any drive letter under win
Download hadoop2.7.3 of winutils.exe tool link: https://pan.baidu.com/s/1pKWAGe3 password: zyi7
replace the original hadoop bin directory with the bin
will hadoop2.7.3 configured into the system environment variables, hadoop2. 7.3/bin can not be equipped

The first step is to configure idea

1.1 Download and install ( https://www.jetbrains.com/idea/ )
After installing the first do not start, wait crack
crack (if economic conditions can, try to support genuine, after all, such a good tool)
download crack packages: link: https://pan.baidu.com/s/1eRSjwJ4 password: mo6d
the crack packets directly copied to the bin directory of the installation directory to
1.2 configuration idea environment
download IDEA 2017.3 of the support package scala
address: link: HTTPS: // PAN .baidu.com/s/1mixLiPU password: dbzu
install IDEA 2017.3 scala support package (required)
Write picture description here

The second step of development

2.1 Create a project (maven project, easy to develop, create a project that supports both java and scala)
maven-archetype-quickstartolve/70/gravity/SouthEast)
Write picture description here
Please ignore the content in the group id, just start it; delete the snapshot of the version
Write picture description here
2.2 Modify the pom .xml file, add a framework used depends
add the following:

<properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

        <scala.version>2.11.4</scala.version>
        <hbase.version>1.2.5</hbase.version>
        <spark.version>2.1.1</spark.version>
        <hadoop.version>2.7.3</hadoop.version>
    </properties>
<dependencies>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>3.8.1</version>
        <scope>test</scope>
    </dependency>

    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>${scala.version}</version>
    </dependency>

    <!-- spark -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.11</artifactId>
        <version>${spark.version}</version>
        <scope>provided</scope>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <!-- hadoop -->
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>${hadoop.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>${hadoop.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>${hadoop.version}</version>
    </dependency>
    <!-- hbase -->
    <dependency>
        <groupId>org.apache.hbase</groupId>
        <artifactId>hbase-client</artifactId>
        <version>${hbase.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hbase</groupId>
        <artifactId>hbase-server</artifactId>
        <version>${hbase.version}</version>
    </dependency>


</dependencies>
    <build>
        <sourceDirectory>src/main/java</sourceDirectory>
        <testSourceDirectory>src/test/java</testSourceDirectory>
        <plugins>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <archive>
                        <manifest>
                            <maniClass></maniClass>
                        </manifest>
                    </archive>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.codehaus.mojo</groupId>
                <artifactId>exec-maven-plugin</artifactId>
                <version>1.3.1</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>exec</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <executable>java</executable>
                    <includeProjectDependencies>false</includeProjectDependencies>
                    <classpathScope>compile</classpathScope>
                    <mainClass>com.dt.spark.SparkApps.App</mainClass>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.6</source>
                    <target>1.6</target>
                </configuration>
            </plugin>
        </plugins>
    </build>

2.3 Add a cluster configuration file to our project to
the hadoop, copy the configuration file to hbase resources folder
Write picture description here

The third step is to write code and realize the case

Case 1: Use java to implement a wordcount case

Equipment test file
I edited after words.txt put on hdfs
Write picture description here
to view the contents of words.txt
Write picture description here
the red box is our words.txt, you can see, using a space between each word split

The JavaWordCount code is as follows:

package com.shanshu.demo;

import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.sql.SparkSession;
import scala.Tuple2;

import java.util.Arrays;
import java.util.Iterator;
import java.util.List;
import java.util.regex.Pattern;

public class JavaWordCount {

    private static final Pattern SPACE = Pattern.compile(" ");

    public static void main(String[] args) throws Exception {

        /*if (args.length < 1) {
            System.err.println("Usage: JavaWordCount <file>");
            System.exit(1);
        }*/

        System.setProperty("hadoop.home.dir","E:\\hadoop-2.7.3");

        SparkSession spark = SparkSession
                .builder().master("spark://192.168.10.84:7077")
                .appName("JavaWordCount")
                .getOrCreate();

        spark.sparkContext()
                .addJar("E:\\myIDEA\\sparkDemo\\out\\artifacts\\sparkDemo_jar\\sparkDemo.jar");

        JavaRDD<String> lines = spark.read().textFile("hdfs://192.168.10.82:8020/user/jzz/word/words.txt").javaRDD();

        JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() {
            @Override
            public Iterator<String> call(String s) {
                return Arrays.asList(SPACE.split(s)).iterator();
            }
        });

        JavaPairRDD<String, Integer> ones = words.mapToPair(
                new PairFunction<String, String, Integer>() {
                    @Override
                    public Tuple2<String, Integer> call(String s) {
                        return new Tuple2<String, Integer>(s, 1);
                    }
                });

        JavaPairRDD<String, Integer> counts = ones.reduceByKey(
                new Function2<Integer, Integer, Integer>() {
                    @Override
                    public Integer call(Integer i1, Integer i2) {
                        return i1 + i2;
                    }
                });

        List<Tuple2<String, Integer>> output = counts.collect();
        for (Tuple2<?,?> tuple : output) {
            System.out.println(tuple._1() + ": " + tuple._2());
        }
        spark.stop();
    }

}

Please note: When @Override is used in the code, you need to modify the java version, otherwise an error will be reported
Write picture description here

Modify the java version of Project
Write picture description here

Modify the java version of Module
Write picture description here

Description: the code is run locally added to the jar to lay code
step jar packaging as follows:

i add path
Write picture description here

Write picture description here

Write picture description here

Write picture description here

Note: Sometimes it is necessary to copy the generated jars to the cluster to run. In order to prevent the compiled jars from being too large, delete these jars
Write picture description here

ii compile
Write picture description here

Write picture description here

iii Compiled result
Write picture description here

iv Copy the directory of the jar package on the disk
Write picture description here

Write this path into the code (must be done, otherwise it cannot be run locally)
as follows:
Write picture description here

v Execute the code, you can see the results as follows after success
Write picture description here

Case 2: Use scala code to read the data of hbase
Preparation work: create a table fruit in hbase, column family is info, and insert the data (built) to
view:
Write picture description here

i Copy the configuration file of hbase to the resources directory of idea
Write picture description here

ii Add scala jar
Write picture description here

iii Create a scala folder in the main directory and set scala to source
Write picture description here

iv Directory Copy the META-INF directory in the java directory to scala, delete the MANIFEST.MF file, and create a package
Write picture description here

v Write scala code
Write picture description here

code show as below:

package com.shanshu.scala

import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.util.Bytes
import org.apache.spark.{SparkConf, SparkContext}

object ReadHbase {

  def main(args: Array[String]): Unit = {
    val conf = HBaseConfiguration.create()

    conf.set("hbase_zookeeper_property_clientPort","2181")
    conf.set("hbase_zookeeper_quorum","192.168.10.82")

    val sparkConf = new SparkConf().setMaster("local[3]").setAppName("readHbase")

    val sc = new SparkContext(sparkConf)
    //设置查询的表名
    conf.set(TableInputFormat.INPUT_TABLE, "fruit")
    val stuRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
      classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
      classOf[org.apache.hadoop.hbase.client.Result])

    //遍历输出
    stuRDD.foreach({ case (_,result) =>
      val key = Bytes.toString(result.getRow)
      val name = Bytes.toString(result.getValue("info".getBytes,"name".getBytes))
      val color = Bytes.toString(result.getValue("info".getBytes,"color".getBytes))
      val num = Bytes.toString(result.getValue("info".getBytes,"num".getBytes))
      val people = Bytes.toString(result.getValue("info".getBytes,"people".getBytes))
      println("Row key:"+key+" Name:"+name+" color:"+color+" num:"+num+" people"+people)
    })
    sc.stop()
  }
}

vi Similarly, open the jar package (not necessary, only needed when executing on the cluster) and
delete the original jar, because we have to choose the main class of scala this time
Write picture description here

The results of vii operation are as follows:
Write picture description here

QQ: 28169942401 (Do not disturb the advertisement)

Guess you like

Origin blog.csdn.net/babyhuang/article/details/78789920