Ubuntu 16.04 SPARK 开发环境搭建
这里首先是基于Hadoop 已经装好了情况下,安装SPARK.
具体Hadoop 安装 参考:点击打开链接
如果你没安装JDK 请安装,你在安装Hadoop 时候也必须安装JDK
这里也稍微写点初始工作:
1.安装JDK,下载jdk-8u111-linux-x64.tar.gz,解压到/opt/jdk1.8.0_111
下载地址:http://www.Oracle.com/technetwork/Java/javase/downloads/index.html
1)环境配置:
sudo vim /etc/profile/
在最后一行增加:
export JAVA_HOME=/opt/jdk1.8.0_111
export JRE_HOME=${JAVA_HOME}/jre
exportCLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
exportPATH=${JAVA_HOME}/bin:${JRE_HOME}/bin:$PATH
2)输入:source /etc/profile 使得配置文件生效
3)验证java是否安装成功:java -version
看到java版本相关信息,则表示安装成功!
下面正式进入Spark 安装
Scala应用比较广泛,也需要安装
1. 安装scala,下载scala-2.10.4.tgz,
下载地址:
其次:
下载后, 我这解压到:
/usr/local/scala-2.12.3
解压好了,接着就是配置环境变量了,执行如下命令:
sudo gedit ~/.bashrc
#scala
export SCALA_HOME=/usr/local/scala-2.12.3
export PATH=$PATH:$SCALA_HOME/bin
Source ~/.bashrc
第二步 安装 Spark:
下载地址:
下载后解压到:
我这里是:
/usr/local/spark-2.2.0-bin-hadoop2.7
解压后配置环境变量:
1)环境配置:
sudo vim /etc/profile/
在最后一行增加:
export SPARK_HOME=/opt/spark-1.6.0-bin-hadoop.2.6/
2)输入:source/etc/profile 使得配置文件生效
3)测试安装结果
打开命令窗口,切换到Spark的 bin 目录:
cd /opt/spark-1.6.0-bin-hadoop.2.6/bin/
执行./spark-shell, 打开Scala到Spark的连接窗口, 启动过程中无错误信息,出现scala>,启动成功
再看管理页面,浏览器输入: localhost:4040/
看源码:
package com.xiaoming.sparkdemo;
import java.util.Arrays;
import java.util.regex.Pattern;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.api.java.function.VoidFunction;
import scala.Tuple2;
public class WordCount {
private static final Pattern SPACE = Pattern.compile(" ");
public static void main(String[] args) throws Exception {
SparkConf conf = new SparkConf().setMaster("local").setAppName("wc");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> text = sc.textFile("hdfs://192.168.56.128:9000/user/wangxiaoming/input/bank/892/1200/20170425");
JavaRDD<String> words = text.flatMap(new FlatMapFunction<String, String>() {
private static final long serialVersionUID = 1L;
@Override
public Iterable<String> call(String line) throws Exception {
return Arrays.asList(line.split(" "));//把字符串转化成list
}
});
JavaPairRDD<String, Integer> pairs = words.mapToPair(new PairFunction<String, String, Integer>() {
private static final long serialVersionUID = 1L;
@Override
public Tuple2<String, Integer> call(String word) throws Exception {
return new Tuple2<String, Integer>(word, 1);
}
});
JavaPairRDD<String, Integer> results = pairs.reduceByKey(new Function2<Integer, Integer, Integer>() {
private static final long serialVersionUID = 1L;
@Override
public Integer call(Integer value1, Integer value2) throws Exception {
return value1 + value2;
}
});
JavaPairRDD<Integer, String> temp = results.mapToPair(new PairFunction<Tuple2<String,Integer>, Integer, String>() {
private static final long serialVersionUID = 1L;
@Override
public Tuple2<Integer, String> call(Tuple2<String, Integer> tuple)
throws Exception {
return new Tuple2<Integer, String>(tuple._2, tuple._1);
}
});
JavaPairRDD<String, Integer> sorted = temp.sortByKey(false).mapToPair(new PairFunction<Tuple2<Integer,String>, String, Integer>() {
private static final long serialVersionUID = 1L;
@Override
public Tuple2<String, Integer> call(Tuple2<Integer, String> tuple)
throws Exception {
return new Tuple2<String, Integer>(tuple._2,tuple._1);
}
});
sorted.foreach(new VoidFunction<Tuple2<String,Integer>>() {
private static final long serialVersionUID = 1L;
@Override
public void call(Tuple2<String, Integer> tuple) throws Exception {
System.out.println("word:" + tuple._1 + " count:" + tuple._2);
}
});
sc.close();
}
}
vim spark-env.sh
增加如下配置
export SPARK_MASTER_IP=192.168.56.128
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export HADOOP_HOME=/home/hadoop/hadoop-2.7.3
export SPARK_MASTER_HOST=10.64.66.215
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_MERMORY=2G
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_SCALA_VERSION=2.11
conf目录下:
cp spark-defaults.conf.template spark-defaults.conf
增加如下配置:
spark.master.ip 192.168.56.128#本机ip
spark.master spark://192.168.56.128:7077
spark.driver.bindAddress 192.168.56.128
spark.driver.host 192.168.56.128
cp slaves.template slaves
vim slaves
增加如下配置
192.168.56.128 #设置本地ip,即为伪分布式
启动Spark顺序 start-all.sh sh start-master sh start-slaves.sh 进行启动spark
start-all.sh
上面是sbin目录
如果启动遇到权限问题:
sudo chown -R wangxiaoming spark-2.2.1-bin-hadoop2.7/
如果遇到 spark-shell: 29: set: Illegal option -o posix
这个问题 不要使用 sh spark-shell 执行 需要使用
./spark-shell
执行。
sh start-master.sh
spark-shell使用:
bin目录输入spark-shell
val textFile = sc.textFile("/user/wangxiaoming/test.txt")
输入:
textFile.count()
textFile.first()
textFile.map(line => line.split(" ").size).reduce((a, b) => Math.max(a, b))
提供一份maven Project
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>xiaoming</groupId>
<artifactId>SparkAp</artifactId>
<version>1.0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>SparkAp</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-graphx_2.10</artifactId>
<version>1.6.0</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/java</sourceDirectory>
<testSourceDirectory>src/main/test</testSourceDirectory>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass></mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.2.1</version>
<executions>
<execution>
<goals>
<goal>exec</goal>
</goals>
</execution>
</executions>
<configuration>
<executable>java</executable>
<includeProjectDependencies>true</includeProjectDependencies>
<includePluginDependencies>false</includePluginDependencies>
<classpathScope>compile</classpathScope>
<mainClass>com.dt.spark.App</mainClass>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</plugin>
</plugins>
</build>
</project>
问题解决,给大家发福利,有有效期的, 支付宝扫码, 赶紧 过期不候啊