Spark初识入门Core (一)

Spark初识入门core (一)

标签(空格分隔): Spark的部分


  • 一:spark 简介
  • 二:spark 的安装与配置
  • 三:spark 的wordcount
  • 四:spark 处理数据
  • 五:spark 的Application
  • 六: spark 日志清洗
  • 七:回顾

一:spark 简介

1.1 spark 的来源

Spark是UC Berkeley AMP lab (加州大学伯克利分校的AMP实验室)所开源的类Hadoop MapReduce的通用并行框架,Spark,拥有Hadoop MapReduce所具有的优点;但不同于MapReduce的是Job中间输出结果可以保存在内存中,从而不再需要读写HDFS,因此Spark能更好地适用于数据挖掘与机器学习等需要迭代的MapReduce的算法。
Spark 是一种与 Hadoop 相似的开源集群计算环境,但是两者之间还存在一些不同之处,这些有用的不同之处使 Spark 在某些工作负载方面表现得更加优越,换句话说,Spark 启用了内存分布数据集,除了能够提供交互式查询外,它还可以优化迭代工作负载。
Spark 是在 Scala 语言中实现的,它将 Scala 用作其应用程序框架。与 Hadoop 不同,Spark 和 Scala 能够紧密集成,其中的 Scala 可以像操作本地集合对象一样轻松地操作分布式数据集。
尽管创建 Spark 是为了支持分布式数据集上的迭代作业,但是实际上它是对 Hadoop 的补充,可以在 Hadoop 文件系统中并行运行。通过名为 Mesos 的第三方集群框架可以支持此行为。Spark 由加州大学伯克利分校 AMP 实验室 (Algorithms, Machines, and People Lab) 开发,可用来构建大型的、低延迟的数据分析应用程序。

1.2 spark 的生态环境

![image_1b40erb9d10q31t0qqdmsqs1lksm.png-75.8kB][1]


1.3 spark 与hadoop的 mapreduce 对比

MapReduce

 Hive       Storm       Mahout      Griph

Spark Core

 Spark SQL  Spark Streaming     Spark ML    Spark GraphX    Spark R

1.4 spark 可以运行在什么地方

  Spark Application运行everywhere
    local、yarn、memsos、standalon、ec2 .....

![image_1b40f3h4j1c0m2au1qmlrk61a4s13.png-145.4kB][2]

二 spark的安装与配置

2.1 配置好hadoop的环境安装scala-2.10.4.tgz

tar -zxvf scala-2.10.4.tgz /opt/modules
vim /etc/profile 

export JAVA_HOME=/opt/modules/jdk1.7.0_67
export HADOOP_HOME=/opt/modules/hadoop-2.5.0-cdh5.3.6
export SCALA_HOME=/opt/modules/scala-2.10.4
export SPARK_HOME=/opt/modules/spark-1.6.1-bin-2.5.0-cdh5.3.6

PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin

2.2 安装 spark-1.6.1-bin-2.5.0-cdh5.3.6.tgz

  tar -zxvf spark-1.6.1-bin-2.5.0-cdh5.3.6.tgz
  mv spark-1.6.1-bin-2.5.0-cdh5.3.6 /opt/modules
  cd /opt/modules/spark-1.6.1-bin-2.5.0-cdh5.3.6/conf 
  cp -p spark-env.sh.template spark-env.sh
  cp -p  log4j.properties.template  log4j.properties

  vim spark-env.sh 

增加:

JAVA_HOME=/opt/modules/jdk1.7.0_67
SCALA_HOME=/opt/modules/scala-2.10.4
HADOOP_CONF_DIR=/opt/modules/hadoop-2.5.0-cdh5.3.6/etc/hadoop

![image_1b40o15ugt8v1stklft1ahfm289.png-115.2kB][3]


2.3 spark 命令执行与调用

 执行spark 命令

 bin/spark-shell 

![image_1b40oa3e217t01nuoqlp1tc01o69m.png-406.3kB][4]

2.4 运行测试文件:

 hdfs dfs -mkdir /input 

 hdfs dfs -put READ.md /input 

执行统计

scala> val rdd = sc.textFile("/input/README.md")

![image_1b40qa6ll9uojo45leq41ctb2a.png-232.9kB][5]

rdd.count (统计多少行)
rdd.first (统计第一行)
rdd.filter(line => line.contains("Spark")).count (统计存在Spark的字符的有多少行)

![image_1b40qb9vd2151l8o8kd189l4ll2n.png-458.2kB][6]

![image_1b40qbsttjgi1c4ng2st31b34.png-118kB][7]

scala> rdd.map(line => line.split(" ").size).reduce(_ + _)

![image_1b40qqkcf88v1rpvlks86q1kbv3h.png-240.3kB][8]

三: spark 的wordcount统计

3.1 spark 的wc统计

val rdd=sc.textFile("/input")  ####rdd 读文件
rdd.collect    ###rdd 显示文件的内容 
rdd.count  ####rdd 显示有多少行数据

![image_1b40roaqd1llq196lj4p1r8mfnk9.png-223.2kB][9]
![image_1b40rp3pi6ck12pi10k516bh1u3lm.png-908.7kB][10]

3.2 spark 处理数据三步骤

input 

scala> val rdd =sc.textFile("/input")  ####(输入数据) 

process

val WordCountRDD = rdd.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(( a , b ) => ( a + b ))     ######(处理数据)    

简写:
 val WordCountRDD = rdd.flatMap(_.split(" ")).map(_,1)).reduceByKey(_ + _)  

output

scala> WordCountRDD.saveAsTextFile("/output3")

scala> WordCountRDD.collect

![image_1b40sv5d9e1615l01jkv1chf1qbn13.png-223.5kB][11]

![image_1b40t01h51m2sb0l1qkb7285rn1g.png-77.2kB][12]

![image_1b40t1e44122419bra65141cj4d2a.png-800.8kB][13]
![image_1b40t3hfg1iln174g1nd411fo17c92n.png-133.7kB][14]

![image_1b40t3vonno21ipr1lfap7319j334.png-78.8kB][15]

![image_1b40t8vos2md1f7717l7k18136l3h.png-827.2kB][16]

四、spark 处理数据:

4.1 spark的数据统计

spark 处理pageview 数据:

hdfs dfs -mkdir /page
hdfs dfs -put page_views.data /page 
读取数据:
val rdd = sc.textFile("/page")

处理数据: 
val PageRdd = rdd.map(line => line.split("\t")).map(arr => (arr(2), 1)).reduceByKey(_ + _) 

取数据的前十条数据:

PageRdd.take(10);

![image_1btqin3s91cjefam40b1tsp114k13.png-223.3kB][17]

![image_1btqinr0eda5207qta1ga01rcb1g.png-405.2kB][18]

![image_1btqiof408m11m02pjavr713f1t.png-264.1kB][19]

![image_1btqiosfacmk38b1alm8ea1ru22a.png-264kB][20]

将数据放入内存:
rdd.cache
rdd.count 

rdd.map(line => line.split("\t")).map(arr => (arr(2), 1)).reduceByKey(_ + _).take(10)

![image_1btqj33pv1aji1cc66joo3617c934.png-110.3kB][22]
![image_1btqj7ki01r5t1q1d16751426j143h.png-622.3kB][23]
![image_1btqj89941qv9n4v11po1ifo11h3u.png-221.1kB][24]

五:spark 的Application

5.1 spark 的运行模式

spark 的application
  -1. Yarn  目前最多
  -2. standalone 
      自身分布式资源管理管理和任务调度
  -3 Mesos

 hadoop 2.x release 2.2.0 2013/10/15

 hadoop 2.0.x - al 
 cloudera 2.1.x -bete 

  cdh3.x - 0.20.2 
  cdh4.x - 2.0.0  
    hdfs -> HA: QJM : Federation 
    Cloudera Manager 4.x  
  cdh5.x

5.2 spark 的 Standalone mode

Spark 本身知道的一个分布式资源管理系列以及任务调度框架

类似于 Yarn 这样的框架
   分布式
   主节点
   Master - ResourceManager 
   从节点:
   work -> nodemanager

   打开 spark-env.sh 
   最后增加:
SPARK_MASTER_IP=192.168.3.1
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8080
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_PORT=7078
SPARK_WORKER_WEBUI_PORT=8081
SPARK_WORKER_INSTANCES=1 ## 每台机器可以运行几个work

   cd /soft/spark/conf
   cp -p slaves.template slaves
   echo "flyfish01.yangyang.com" > slaves 
------
启动spark 

cd /soft/spark/sbin

start-slaves.sh 
  启动所有的从节点,也就是work节点
  注意: 使用此命名,运行此命令机器,必须要配置与主节点的无密钥登录,否则启动时时候会出现一些问题,比如说输入密码之类的。

./start-master.sh
./start-slaves.sh

![image_1btqlhj441a31q1i1ear3b91b0t4b.png-354.5kB][25]
![image_1btqlkhop7eplabtmt1nmq1h6c4o.png-156.3kB][26]
![image_1btqlldjn115mb6rj4t1pec1o9855.png-226.3kB][27]

job 运行在standalone 上面

bin/spark-shell --master spark://192.168.3.1:7077

![image_1btqlud421ntv1e7i9vobri16fq5v.png-402.7kB][28]
![image_1btqlutdp1q15ki7130dv35lu6c.png-151.6kB][29]

  • 5.3 standalone 上面运行
    
    读取数据:
    val rdd = sc.textFile("/page")

处理数据:
val PageRdd = rdd.map(line => line.split("\t")).map(arr => (arr(2), 1)).reduceByKey( + )

取数据的前十条数据:

PageRdd.take(10);

![image_1btqm9vgb2u6hr01rjs1eqn1kbc6p.png-222.4kB][30]
![image_1btqmb3hua17tdlopvhf21rj97m.png-95.8kB][31]
![image_1btqmbec81m7a10estl31bu3fmq83.png-227.9kB][32]
![image_1btqmbpv3goc6m912mg135t14uk8g.png-199.3kB][33]
![image_1btqmcmid1o4f1tln146nii171l8t.png-233.6kB][34]
![image_1btqmdimmi7hr9okh0rhqjs49a.png-243.3kB][35]
![image_1btqme14ko7p1or7gas1gk5isf9n.png-222.3kB][36]
![image_1btqnijbp19i8lfu15728pi1t3hbe.png-159.3kB][37]
### 5.4 对于一个spark application 两个部分组成
  • 1、 Driver program -> 4040 4041 4042
    main 方法
    SparkContext -- 最最重要

  • 2、Executor 资源
    一个 jvm (进程)
    运行我们的job的task

    REPL: shell 交互式命令

    spark Application
    job -01
    count
    job -02
    stage-01
    task-01 (线程) -> map task (进程)
    task-02 (线程) -> map task (进程)
    每个stage 中的所有的task,业务都是相同的,处理的数据不同
    stage -02

    job -03

    从上述运行的程序来看:
    如果RDD 调用的函数,返回值不是RDD的时候,就会触发一个job 进行执行

思考:
reduceByKey 到底做了什么事情:

-1. 分组
将相同的key 的value 进行合并
-2.对value 进行reduce
进行合并

经分析,对比mapreduce 中的worldcount 程序运行,推断出spark job 中 stage 的划分依据RDD 之间否产生shuffle 进行划分

![image_1btqmk5k0161l1qk1udq1c6t1l70a4.png-237.5kB][38]
![image_1btqn68elc5c4921dfve5ll25ah.png-213.1kB][39]

倒序查询:
val rdd = sc.textFile("/input")
val WordContRdd = rdd.flatMap(.split(" ")).map((,1)).reduceByKey( + )
val sortRdd = WordContRdd.map(tuple => (tuple._2, tuple._1)).sortByKey(false)
sortRdd.collect
sortRdd.take(3)
sortRdd.take(3).map(tuple => (tuple._2, tuple._1))

![image_1btqp481vu2g1mur1gd1td9g2hbr.png-247.2kB][40]

![image_1btqp5g401hg679vcdg524g9d8.png-98.8kB][41]

![image_1btqp6g6l1ui91pm77mb1v5n1rs7dl.png-559.1kB][42]

![image_1btqp72rh15ik3e27ln11mojthe2.png-286.8kB][43]

![image_1btqp7o251g05ksijvt1qh11v4sef.png-426.9kB][44]

![image_1btqp8bk61vbh74enlu1cbu652es.png-351.2kB][45]

![image_1btqp8vm6638545det164rrbf9.png-212.9kB][46]

scala 的隐式转换:
隐式转换:
将某个类型转换为另外一个类型。
隐式函数
implicit def

### 5.4 在企业中开发spark的任务

如何开发spark application

spark-shell + idea

-1, 在idea 中编写代码

-2,在spark-shell 中执行代码

-3. 使用IDEA 将代码打包成jar包,使用bin/spark-submint 提交运行


### 5.5 spark 在Linux下面的idea 编程
    10万条数据取前10条

package com.ibeifeng.bigdata.senior.core

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

/**

  • Created by root on 17-11-2.
  • Driver Program
  • */
    object SparkApp {

    def main(args: Array[String]) {

    // step0: sSparkContext

    val sparkConf = new SparkConf()
    .setAppName("SparkApplication")
    .setMaster("local[2]")

    // create SparkContext

    val sc = new SparkContext(sparkConf)

    //*=========================================/

    //step 1: input data

    val rdd = sc.textFile("/page/page_views.data")

    //step 2: process data

    val pageWordRddTop10 = rdd
    .map(line => line.split("\t"))
    .map(x => (x(2),1))
    .reduceByKey( + )
    .map(tuple => (tuple. _2, tuple._1))
    .sortByKey(false)
    .take(10)

    //Step 3 : output data

    pageWordRddTop10.foreach(println(_))

    //*=========================================/

    //close spark

    sc.stop()

    }

}

![image_1btt7cn2ov6v1oqj1i3h12s51jlp9.png-176kB][47]

- 5.6 将代码打包成一个jar包运行
- 
![image_1btt7p7e0t2hok4135n1oha1k619.png-262.8kB][48]

![image_1btt7rpep1niaa17rin116rn73m.png-304.6kB][49]

![image_1btt7tvej12u9kuofa5i5nmbs13.png-342kB][50]

![image_1btt82ao11a0saie1thh108ve8d1g.png-406.7kB][51]

![image_1btt84pt74j514mr991ng41h3m1t.png-354kB][52]

![image_1btt85jdd1hb91nmass8m19dr72a.png-360.8kB][53]

![image_1btt894q2kn71k0t1dth1kts1mtu2n.png-540.4kB][54]

![image_1btt8b81a1n336so19vhnci15at34.png-271.4kB][55]

![image_1btt8c7n2vgmrik10m82avu741.png-170.1kB][56]

![image_1btt8dd7ov6g36k17bepp8ue4u.png-171.5kB][57]

![image_1btt8ef5jg5i10ao1dcr1l4surc5b.png-109.4kB][58]

- 5.7 spark 提交任务:
     运行在local  

bin/spark-submint Scala_Project.jar

![image_1btt92m7bu6c1hs916k611n6iu55o.png-271kB][59]

![image_1btt93bqe1kvg1dnpddq9231tch65.png-320.8kB][60]

    运行在standalone

![image_1btt998qio8vngo6882qj15kd6i.png-537.9kB][61]

![image_1btt9c12sgevk9d11v312uue9k6v.png-254.9kB][62]

![image_1btt9ck8mg0ergftghvsk1c407c.png-106.4kB][63]

启动spark 的standalone

bin/start-master.sh
bin/start-slave2.sh

![image_1btt9fo8f10jc1v7j135okol15ts7p.png-197.8kB][64]

![image_1btt9iaav1t8rftsl5413je197ra6.png-312.2kB][65]

bin/spark-submit --master spark://192.168.3.1:7077 Scala_Project.jar

![image_1btt9mub7cgg2et1e5r1irt5nmaj.png-554.6kB][66]

![image_1btt9nkq75uh15m014rqaii2htb0.png-226.9kB][67]

![image_1btt9o8ms1d7tmii1crlcov18lhbd.png-358.8kB][68]

- 5.7 spark 的historyserver配置 

spark 监控运行完成的spark application

分为两个部分:

第一: 设置sparkApplication 在运行时,需要记录日志信息

第二: 启动historyserver 通过界面查看

------

配置historyserver

cd /soft/spark/conf

cp -p spark-defaults.conf.template spark-defaults.conf

vim defaults.conf

spark.master spark://192.168.3.1:7077
spark.eventLog.enabled true

spark.eventLog.dir hdfs://192.168.3.1:8020/SparkJobLogs

spark.eventLog.compress true

启动spark-shell
bin/spark-shell

![image_1bttakmgv17b0ig01tqb7o3qaibq.png-397kB][69]

![image_1bttamgcdposngoo8suvb16thd7.png-396.3kB][70]

![image_1bttaofol1mi71b1qnr7lt416lmdk.png-150.2kB][71]

bin/spark-submit --master spark://192.168.3.1:7077 Scala_Project.jar

![image_1bttbbcutkflom1irq1pcm1d5e1.png-264.7kB][72]

![image_1bttbc1s110191g8h17u1ji21bdqee.png-80.5kB][73]

![image_1bttbco9e1o5v60hep3d91v73er.png-263.1kB][74]

![image_1bttbdbapgiq8i51af6d5f1avff8.png-305.7kB][75]
![image_1bttbec3t5cn1eac1co51gsd324fl.png-227.2kB][76]

配置spark的服务端historyserver

vim spark-env.sh

SPARK_MASTER_IP=192.168.3.1
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8080
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_PORT=7078
SPARK_WORKER_WEBUI_PORT=8081
SPARK_WORKER_INSTANCES=1 ## 每台机器可以运行几个work

#增加
SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://flyfish01.yangyang.com:8020/SparkJobLogs -Dspark.history.fs.cleaner.enabled=true"

-------------

#启动historyserver

cd /soft/spark
sbin/start-history-server.sh

![image_1bttc313v58p9so168t2e83gggi.png-103.6kB][77]

![image_1bttcac0b1jn51to41eogiq31n20gv.png-208.5kB][78]

![image_1bttcckkcafj1dhl1l2fj406b9hc.png-373.2kB][79]

![image_1bttcehdv1dea14qrp8t127v1ueghp.png-367.4kB][80]

---
###六: spark 的日志分析

需求一:
The average, min, and max content size of responses returned from the server.

ContentSize

需求二:
A count of response code's returned.

responseCode

需求三:
All IPAddresses that have accessed this server more than N times.

ipAddresses

需求四:
The top endpoints requested by count.

endPoint

### 6.1 maven 创建工程:
#### 6.1.1 使用命令行创建

mvn archetype:generate -DarchetypeGroupId=org.scala-tools.archetypes -DarchetypeArtifactId=scala-archetype-simple -DremoteRepositories=http://scala-tools.org/repo-releases -DgroupId=com.ibeifeng.bigdata.spark.app -DartifactId=log-analyzer -Dversion=1.0

#### 6.1.2 导入工程

![image_1bttptu0g1mip1906188j1mtn1pavme.png-67.3kB][81]

![image_1bttpumtj6151ofu15e8igs1a9smr.png-151.8kB][82]

![image_1bttpveo11qng67s1p7a19eo1m1nn8.png-81kB][83]

![image_1bttq017jfv6ne83c44516mtnl.png-174kB][84]

![image_1bttq0hb162d145a1sh8qraphbo2.png-75.3kB][85]

![image_1bttq12k21mom9um1nhpov01begof.png-251.4kB][86]

![image_1bttq22lq9pd9pgb71jo91cldos.png-73.5kB][87]

![image_1bttq6ti46o2qcj1n851ecuu90p9.png-195.6kB][88]

#### 6.1.3 pom.xml 文件:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">;
<modelVersion>4.0.0</modelVersion>
<groupId>com.ibeifeng.bigdata.spark.app</groupId>
<artifactId>log-analyzer</artifactId>
<version>1.0</version>
<name>${project.artifactId}</name>
<description>My wonderfull scala app</description>
<inceptionYear>2010</inceptionYear>

<properties>
<encoding>UTF-8</encoding>
<hadoop.version>2.5.0</hadoop.version>
<spark.version>1.6.1</spark.version>
</properties>

<dependencies>

org.apache.hadoop hadoop-client ${hadoop.version} compile org.apache.spark spark-core_2.10 ${spark.version} compile junit junit 4.8.1 test src/main/scala src/test/scala org.scala-tools maven-scala-plugin 2.15.0 compile testCompile -make:transitive -dependencyfile ${project.build.directory}/.scala_dependencies org.apache.maven.plugins maven-surefire-plugin 2.6 false true **/*Test.* **/*Suite.* ``` #### 6.1.4 增加scala的jar包 ![image_1bttqb9bu1vp01dv9ie110s1conpm.png-353.7kB][89] #### 6.1.5 创建LogAnalyzer.scala ```scala package com.ibeifeng.bigdata.spark.app.core import org.apache.spark.{SparkContext, SparkConf} /** * Created by zhangyy on 2016/7/16. */ object LogAnalyzer { def main(args: Array[String]) { // step 0: SparkContext val sparkConf = new SparkConf() .setAppName("LogAnalyzer Applicaiton") // name .setMaster("local[2]") // --master local[2] | spark://xx:7077 | yarn // Create SparkContext val sc = new SparkContext(sparkConf) /** ================================================================== */ val logFile = "/logs/apache.access.log" // step 1: input data val accessLogs = sc.textFile(logFile) /** * parse log */ .map(line => ApacheAccessLog.parseLogLine(line)) /** * The average, min, and max content size of responses returned from the server. */ val contentSizes = accessLogs.map(log => log.contentSize) // compute val avgContentSize = contentSizes.reduce(_ + _) / contentSizes.count() val minContentSize = contentSizes.min() val maxContentSize = contentSizes.max() // println printf("Content Size Avg: %s , Min : %s , Max: %s".format( avgContentSize, minContentSize, maxContentSize )) /** * A count of response code's returned */ val responseCodeToCount = accessLogs .map(log => (log.responseCode, 1)) .reduceByKey(_ + _) .take(3) println( s"""Response Code Count: ${responseCodeToCount.mkString(", ")}""" ) /** * All IPAddresses that have accessed this server more than N times */ val ipAddresses = accessLogs .map(log => (log.ipAddress, 1)) .reduceByKey( _ + _) // .filter( x => (x._2 > 10)) .take(5) println( s"""IP Address : ${ipAddresses.mkString("< ", ", " ," >")}""" ) /** * The top endpoints requested by count */ val topEndpoints = accessLogs .map(log => (log.endPoint, 1)) .reduceByKey(_ + _) .map(tuple => (tuple._2, tuple._1)) .sortByKey(false) .take(3) .map(tuple => (tuple._2, tuple._1)) println( s"""Top Endpoints : ${topEndpoints.mkString("[", ", ", " ]")}""" ) /** ================================================================== */ // Stop SparkContext sc.stop() } } ``` #### 6.1.5 创建匹配日志匹配文件: ``` package com.ibeifeng.bigdata.spark.app.core /** * Created by zhangyy on 2016/7/16. * * 1.1.1.1 - - [21/Jul/2014:10:00:00 -0800] * "GET /chapter1/java/src/main/java/com/databricks/apps/logs/LogAnalyzer.java HTTP/1.1" * 200 1234 */ case class ApacheAccessLog ( ipAddress: String, clientIndentd: String, userId: String, dateTime:String, method: String, endPoint: String, protocol: String, responseCode: Int, contentSize: Long) object ApacheAccessLog{ // regex // 1.1.1.1 - - [21/Jul/2014:10:00:00 -0800] "GET /chapter1/java/src/main/java/com/databricks/apps/logs/LogAnalyzer.java HTTP/1.1" 200 1234 val PARTTERN ="""^(\S+) (\S+) (\S+) \[([\w:/]+\s[+\-]\d{4})\] "(\S+) (\S+) (\S+)" (\d{3}) (\d+)""".r /** * * @param log * @return */ def parseLogLine(log: String): ApacheAccessLog ={ // parse log val res = PARTTERN.findFirstMatchIn(log) // invalidate if(res.isEmpty){ throw new RuntimeException("Cannot parse log line: " + log) } // get value val m = res.get // return ApacheAccessLog( // m.group(1), // m.group(2), m.group(3), m.group(4), m.group(5), m.group(6), m.group(7), m.group(8).toInt, m.group(9).toLong) } } ``` #### 6.1.6 报错 ``` Exception in thread "main" java.lang.SecurityException: class "javax.servlet.FilterRegistration"'s signer information does not match signer information of other classes in the same package at java.lang.ClassLoader.checkCerts(ClassLoader.java:952) at java.lang.ClassLoader.preDefineClass(ClassLoader.java:666) at java.lang.ClassLoader.defineClass(ClassLoader.java:794) ----- 删掉 javax.servlet-xxxx.api 的maven依赖包 ``` #### 6.1.7 输出: ![image_1btu0d0qtcov1b241sdl1jsn1efqj.png-113.3kB][90] ![image_1btu0e2t6jr82nnlfq15mt17oarg.png-91.6kB][91] ![image_1btu0er155qe17bs1o0u1sl417kert.png-127.1kB][92] ![image_1btu0g7ff1c2p10mb1pdt1jsu1rs5sq.png-99.6kB][93] ### 七:回顾 ``` 回顾: -1,了解认识Spark MapReduce比较 “四大优势” --1,速度快 --2,使用简单 --3,一栈式 --4,无处不在的运行 开发测试 SCALA: REPL/Python -2,Spark Core 两大抽象概念 --1,RDD 集合,存储不同类型的数据 - List ---1,内存 memory ---2,分区 hdfs: block ---3,对每个分区上数据进行操作 function --2,共享变量shared variables ---1,广播变量 ---2,累加器 计数器 -3,环境与开发 --1,Local Mode spark-shell --2,Spark Standalone 配置 启动 监控 使用 --3,HistoryServer -1,针对每个应用是否记录eventlog -2,HistoryServer进行展示 --4,如何使用IDE开发Spark Application -1,SCALA PROJECt 如何添加Spark JAR包 -2,MAVEN PROJECT ================================================= Spark 开发 step 1: input data -> rdd/dataframe step 2: process data -> rdd##xx() / df#xx | "select xx, * from xx ..." step 3: output data -> rdd.saveXxxx / df.write.jdbc/json/xxx ``` [1]: http://static.zybuluo.com/zhangyy/6q90psdonlak3rxq4ffdo85u/image_1b40erb9d10q31t0qqdmsqs1lksm.png [2]: http://static.zybuluo.com/zhangyy/qa05kfxpdflhuq8bb85efeho/image_1b40f3h4j1c0m2au1qmlrk61a4s13.png [3]: http://static.zybuluo.com/zhangyy/hvtn541d9t7zj7aktp901m95/image_1b40o15ugt8v1stklft1ahfm289.png [4]: http://static.zybuluo.com/zhangyy/3ek2mfzx401thwmd4zrlzd9e/image_1b40oa3e217t01nuoqlp1tc01o69m.png [5]: http://static.zybuluo.com/zhangyy/b98wfoyplasldtdrtgnu85tq/image_1b40qa6ll9uojo45leq41ctb2a.png [6]: http://static.zybuluo.com/zhangyy/p1id58aa51tqrfewioktx49h/image_1b40qb9vd2151l8o8kd189l4ll2n.png [7]: http://static.zybuluo.com/zhangyy/o4vxid7fbry4zvy470paiswa/image_1b40qbsttjgi1c4ng2st31b34.png [8]: http://static.zybuluo.com/zhangyy/qj9gp1gq9hdiozciww37w7o0/image_1b40qqkcf88v1rpvlks86q1kbv3h.png [9]: http://static.zybuluo.com/zhangyy/kaxibg57kyjuxru4p63vvvse/image_1b40roaqd1llq196lj4p1r8mfnk9.png [10]: http://static.zybuluo.com/zhangyy/e2iwj5eow8xc1ok01dwljjt9/image_1b40rp3pi6ck12pi10k516bh1u3lm.png [11]: http://static.zybuluo.com/zhangyy/5rz9584txris23ess4mtm6nr/image_1b40sv5d9e1615l01jkv1chf1qbn13.png [12]: http://static.zybuluo.com/zhangyy/wur90svx35xcthq3p44a7yui/image_1b40t01h51m2sb0l1qkb7285rn1g.png [13]: http://static.zybuluo.com/zhangyy/35ar8v5x7kfueky4dv84fscr/image_1b40t1e44122419bra65141cj4d2a.png [14]: http://static.zybuluo.com/zhangyy/kzl6vdbcsix3mtznqxmd1vvu/image_1b40t3hfg1iln174g1nd411fo17c92n.png [15]: http://static.zybuluo.com/zhangyy/dxa91l1ukwkqylwr3cr8bs65/image_1b40t3vonno21ipr1lfap7319j334.png [16]: http://static.zybuluo.com/zhangyy/plmv2dg12ukvzosw7k3otn3x/image_1b40t8vos2md1f7717l7k18136l3h.png [17]: http://static.zybuluo.com/zhangyy/dangh0niperarcd1mco91h50/image_1btqin3s91cjefam40b1tsp114k13.png [18]: http://static.zybuluo.com/zhangyy/fmmhk46qshanvmvrvd478buj/image_1btqinr0eda5207qta1ga01rcb1g.png [19]: http://static.zybuluo.com/zhangyy/4s5t3w4drr6jd0ktjb164dhm/image_1btqiof408m11m02pjavr713f1t.png [20]: http://static.zybuluo.com/zhangyy/ho4mqudb0kr6pql4wf92jlf1/image_1btqiosfacmk38b1alm8ea1ru22a.png [21]: http://static.zybuluo.com/zhangyy/obh7efkgo2uod7buih7b5n7g/image_1btqivgm2193o1oo81qle1rt71p2h2n [22]: http://static.zybuluo.com/zhangyy/hd9s73spqach1f399v30tyh3/image_1btqj33pv1aji1cc66joo3617c934.png [23]: http://static.zybuluo.com/zhangyy/qijbi9046hca29ebctc3c9v4/image_1btqj7ki01r5t1q1d16751426j143h.png [24]: http://static.zybuluo.com/zhangyy/fknorju7586ukw8tt2y1xip0/image_1btqj89941qv9n4v11po1ifo11h3u.png [25]: http://static.zybuluo.com/zhangyy/y9gowe7ualau71ndzy22lg3s/image_1btqlhj441a31q1i1ear3b91b0t4b.png [26]: http://static.zybuluo.com/zhangyy/aiaugk4lf65mrvu1fe2290mx/image_1btqlkhop7eplabtmt1nmq1h6c4o.png [27]: http://static.zybuluo.com/zhangyy/3d2jhmc0w9clrgzgpno5dg0y/image_1btqlldjn115mb6rj4t1pec1o9855.png [28]: http://static.zybuluo.com/zhangyy/g8lgkzt1vu68ygbqio3zgw2d/image_1btqlud421ntv1e7i9vobri16fq5v.png [29]: http://static.zybuluo.com/zhangyy/9dm624219imhnxdhf8iu3ocu/image_1btqlutdp1q15ki7130dv35lu6c.png [30]: http://static.zybuluo.com/zhangyy/stisrxwvsiq2qdpcqy0p7i49/image_1btqm9vgb2u6hr01rjs1eqn1kbc6p.png [31]: http://static.zybuluo.com/zhangyy/ntkn8dajoimix92jnw92xz8a/image_1btqmb3hua17tdlopvhf21rj97m.png [32]: http://static.zybuluo.com/zhangyy/743a1e36p5ry6nzcayhi4r5g/image_1btqmbec81m7a10estl31bu3fmq83.png [33]: http://static.zybuluo.com/zhangyy/tbo2w3qcks9ivxhpyfub53z4/image_1btqmbpv3goc6m912mg135t14uk8g.png [34]: http://static.zybuluo.com/zhangyy/ijk4njc21mxva33feksv9tl5/image_1btqmcmid1o4f1tln146nii171l8t.png [35]: http://static.zybuluo.com/zhangyy/hqnik376clwci8h3zau529y1/image_1btqmdimmi7hr9okh0rhqjs49a.png [36]: http://static.zybuluo.com/zhangyy/v9cfw6w42e1e3k050232zha8/image_1btqme14ko7p1or7gas1gk5isf9n.png [37]: http://static.zybuluo.com/zhangyy/alh7lwvk20wz2lz6lezbkqsc/image_1btqnijbp19i8lfu15728pi1t3hbe.png [38]: http://static.zybuluo.com/zhangyy/xsgdqyyddcqiqb3grn74j0ez/image_1btqmk5k0161l1qk1udq1c6t1l70a4.png [39]: http://static.zybuluo.com/zhangyy/d5ocihili46pa9977a3lx6gq/image_1btqn68elc5c4921dfve5ll25ah.png [40]: http://static.zybuluo.com/zhangyy/naactt5iix57g27qy4hqyz7r/image_1btqp481vu2g1mur1gd1td9g2hbr.png [41]: http://static.zybuluo.com/zhangyy/j6g3a8sb2kizn2rlar906dwq/image_1btqp5g401hg679vcdg524g9d8.png [42]: http://static.zybuluo.com/zhangyy/v50slqsr4t8z2gcibaq150my/image_1btqp6g6l1ui91pm77mb1v5n1rs7dl.png [43]: http://static.zybuluo.com/zhangyy/prcqxi34j844aha1z9k3tone/image_1btqp72rh15ik3e27ln11mojthe2.png [44]: http://static.zybuluo.com/zhangyy/3pcd4k60ahai9vtzp2jgogjq/image_1btqp7o251g05ksijvt1qh11v4sef.png [45]: http://static.zybuluo.com/zhangyy/2cbxaxds4048g0zly9vwd1b2/image_1btqp8bk61vbh74enlu1cbu652es.png [46]: http://static.zybuluo.com/zhangyy/hqr6meo72254qxrsr9nqddoo/image_1btqp8vm6638545det164rrbf9.png [47]: http://static.zybuluo.com/zhangyy/s0ebmtmh955ym2diwvjli6ry/image_1btt7cn2ov6v1oqj1i3h12s51jlp9.png [48]: http://static.zybuluo.com/zhangyy/ufkxxsbid64u78q286b9m7yn/image_1btt7p7e0t2hok4135n1oha1k619.png [49]: http://static.zybuluo.com/zhangyy/t0r2m9ifvvq67p34kcafxiyl/image_1btt7rpep1niaa17rin116rn73m.png [50]: http://static.zybuluo.com/zhangyy/es6vkivnp90evpndgivu23fg/image_1btt7tvej12u9kuofa5i5nmbs13.png [51]: http://static.zybuluo.com/zhangyy/izq89kwz1dguj5k0hkffk25h/image_1btt82ao11a0saie1thh108ve8d1g.png [52]: http://static.zybuluo.com/zhangyy/vzppvyc5sr0v1v10n56qsfeo/image_1btt84pt74j514mr991ng41h3m1t.png [53]: http://static.zybuluo.com/zhangyy/lbi7vugdxun71uclhit4yi31/image_1btt85jdd1hb91nmass8m19dr72a.png [54]: http://static.zybuluo.com/zhangyy/x2grzsdipu6j0pyy59v9hh99/image_1btt894q2kn71k0t1dth1kts1mtu2n.png [55]: http://static.zybuluo.com/zhangyy/800aoygljxaqigo29bzv7vwj/image_1btt8b81a1n336so19vhnci15at34.png [56]: http://static.zybuluo.com/zhangyy/l00fmalnjm4orfw12kwltfwp/image_1btt8c7n2vgmrik10m82avu741.png [57]: http://static.zybuluo.com/zhangyy/qs6vpr4sa14v8x2atrawghx2/image_1btt8dd7ov6g36k17bepp8ue4u.png [58]: http://static.zybuluo.com/zhangyy/06ecld19dvrsgjx6rg2nagv4/image_1btt8ef5jg5i10ao1dcr1l4surc5b.png [59]: http://static.zybuluo.com/zhangyy/xqm5bzity0ck4e8boouoaduj/image_1btt92m7bu6c1hs916k611n6iu55o.png [60]: http://static.zybuluo.com/zhangyy/uei48fxq865uuwx90xyg9uwy/image_1btt93bqe1kvg1dnpddq9231tch65.png [61]: http://static.zybuluo.com/zhangyy/xmxyvlg167mvdcxo95sbd832/image_1btt998qio8vngo6882qj15kd6i.png [62]: http://static.zybuluo.com/zhangyy/z32t204no8glsnsrgg92i4rk/image_1btt9c12sgevk9d11v312uue9k6v.png [63]: http://static.zybuluo.com/zhangyy/2w469h6u82yo94dluxm8byfq/image_1btt9ck8mg0ergftghvsk1c407c.png [64]: http://static.zybuluo.com/zhangyy/iwusilg4m641ps8rh88abol9/image_1btt9fo8f10jc1v7j135okol15ts7p.png [65]: http://static.zybuluo.com/zhangyy/3v418c73ztqq76a2vi5vu16j/image_1btt9iaav1t8rftsl5413je197ra6.png [66]: http://static.zybuluo.com/zhangyy/02x77dz0kg3szzzvx4pq1n1k/image_1btt9mub7cgg2et1e5r1irt5nmaj.png [67]: http://static.zybuluo.com/zhangyy/wy8o5oy7oect1tmicoocq3o6/image_1btt9nkq75uh15m014rqaii2htb0.png [68]: http://static.zybuluo.com/zhangyy/2q2ktffuz7t5ec01olopxn6k/image_1btt9o8ms1d7tmii1crlcov18lhbd.png [69]: http://static.zybuluo.com/zhangyy/o279fsk1luxs8bq2rqtdoy10/image_1bttakmgv17b0ig01tqb7o3qaibq.png [70]: http://static.zybuluo.com/zhangyy/xe594fl7v2w09z4a0arksc96/image_1bttamgcdposngoo8suvb16thd7.png [71]: http://static.zybuluo.com/zhangyy/3kz2lzazef1ya1ny16lbf3ri/image_1bttaofol1mi71b1qnr7lt416lmdk.png [72]: http://static.zybuluo.com/zhangyy/zo61y311qjt1z6333d7gh3kn/image_1bttbbcutkflom1irq1pcm1d5e1.png [73]: http://static.zybuluo.com/zhangyy/mrnj1iiojapt9acwt2brcqnn/image_1bttbc1s110191g8h17u1ji21bdqee.png [74]: http://static.zybuluo.com/zhangyy/glzocgex0nfsfafuqeki6amd/image_1bttbco9e1o5v60hep3d91v73er.png [75]: http://static.zybuluo.com/zhangyy/zxwz5ab7ye4v29iqybme02wn/image_1bttbdbapgiq8i51af6d5f1avff8.png [76]: http://static.zybuluo.com/zhangyy/g9husb9oznqin5w61gl6g5ol/image_1bttbec3t5cn1eac1co51gsd324fl.png [77]: http://static.zybuluo.com/zhangyy/zugb4bdmcw3gei9euar7ltsy/image_1bttc313v58p9so168t2e83gggi.png [78]: http://static.zybuluo.com/zhangyy/caputgef88mi5w2psn359lnk/image_1bttcac0b1jn51to41eogiq31n20gv.png [79]: http://static.zybuluo.com/zhangyy/pah2hwo7ks628xg82f5k712k/image_1bttcckkcafj1dhl1l2fj406b9hc.png [80]: http://static.zybuluo.com/zhangyy/43o9izshlx36dut2q2ov74jr/image_1bttcehdv1dea14qrp8t127v1ueghp.png [81]: http://static.zybuluo.com/zhangyy/z0vzd7r3nvcc21iy5w4y18xh/image_1bttptu0g1mip1906188j1mtn1pavme.png [82]: http://static.zybuluo.com/zhangyy/i4d7u8e671g747973dvem6r4/image_1bttpumtj6151ofu15e8igs1a9smr.png [83]: http://static.zybuluo.com/zhangyy/o8g6ylhpmaecm33tdnzlx227/image_1bttpveo11qng67s1p7a19eo1m1nn8.png [84]: http://static.zybuluo.com/zhangyy/k6ptbg2r14f76ukzlwjt2owc/image_1bttq017jfv6ne83c44516mtnl.png [85]: http://static.zybuluo.com/zhangyy/pbg510jzmev2c1vn7jr0xpr5/image_1bttq0hb162d145a1sh8qraphbo2.png [86]: http://static.zybuluo.com/zhangyy/a5qe0skrghrjms62lupwf1xz/image_1bttq12k21mom9um1nhpov01begof.png [87]: http://static.zybuluo.com/zhangyy/pu3zc2bhm5f1tzaqv3smrf1j/image_1bttq22lq9pd9pgb71jo91cldos.png [88]: http://static.zybuluo.com/zhangyy/v8qu1upqw9fpsle3erx24qxq/image_1bttq6ti46o2qcj1n851ecuu90p9.png [89]: http://static.zybuluo.com/zhangyy/k4yof0t1rgdckkrpslqrh32s/image_1bttqb9bu1vp01dv9ie110s1conpm.png [90]: http://static.zybuluo.com/zhangyy/96bvdo64tt6gu4ltig7b8s5w/image_1btu0d0qtcov1b241sdl1jsn1efqj.png [91]: http://static.zybuluo.com/zhangyy/hy8mvhidefn74jofmnf6nam7/image_1btu0e2t6jr82nnlfq15mt17oarg.png [92]: http://static.zybuluo.com/zhangyy/orxx2ziztlmi7pg3dlw0o5sx/image_1btu0er155qe17bs1o0u1sl417kert.png [93]: http://static.zybuluo.com/zhangyy/bmdbowmlfi30tyvod3xjob54/image_1btu0g7ff1c2p10mb1pdt1jsu1rs5sq.png

猜你喜欢

转载自blog.51cto.com/flyfish225/2113453