Monitor the spark of performance indicators

sparkUi 4040 interface has been running the monitoring indicators, why do we have to customize into redis?

1. combine their operations, monitoring page can be integrated into their data platform to facilitate problem identification, e-mail alerts

2. On the basis of sparkUi can on what you want to add some statistical indicators

A, spark of SparkListener
sparkListener an interface, sparkListener interface methods need to customize various abstract class that implements the monitoring we use, SparkListener the function name corresponding to each event is very straightforward, i.e. expressed as literal meaning. Want to do some custom actions which stage of events, become inherited SparkListener corresponding function can be achieved, these methods will help monitor the amount of data I spark the various stages of operation, so we can get these metrics to monitor data

abstract class SparkListener extends SparkListenerInterface { 
upon completion of the call // stage override def onStageCompleted(stageCompleted: SparkListenerStageCompleted): Unit = { }
Called when submitting // stage override def onStageSubmitted(stageSubmitted: SparkListenerStageSubmitted): Unit = { } override def onTaskStart(taskStart: SparkListenerTaskStart): Unit = { } override def onTaskGettingResult(taskGettingResult: SparkListenerTaskGettingResult): Unit = { } // end call to task override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = { } override def onJobStart(jobStart: SparkListenerJobStart): Unit = { } override def onJobEnd(jobEnd: SparkListenerJobEnd): Unit = { } override def onEnvironmentUpdate(environmentUpdate: SparkListenerEnvironmentUpdate): Unit = { } override def onBlockManagerAdded(blockManagerAdded: SparkListenerBlockManagerAdded): Unit = { } override def onBlockManagerRemoved( blockManagerRemoved: SparkListenerBlockManagerRemoved): Unit = { } override def onUnpersistRDD(unpersistRDD: SparkListenerUnpersistRDD): Unit = { } override def onApplicationStart(applicationStart: SparkListenerApplicationStart): Unit = { } override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = { } override def onExecutorMetricsUpdate( executorMetricsUpdate: SparkListenerExecutorMetricsUpdate): Unit = { } override def onExecutorAdded(executorAdded: SparkListenerExecutorAdded): Unit = { } override def onExecutorRemoved(executorRemoved: SparkListenerExecutorRemoved): Unit = { } override def onBlockUpdated(blockUpdated: SparkListenerBlockUpdated): Unit = { } override def onOtherEvent(event: SparkListenerEvent): Unit = { } }

1. realize their SparkListener, for onTaskEnd index method is stored in redis

(1) SparkListener is an interface to create a class that inherits MySparkAppListener SparkListener, can achieve inside onTaskEnd

(2)方法:override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = { }

  SparkListenerTaskEnd类:

case class SparkListenerTaskEnd(
                                 //spark的stageId
                                 stageId: Int,
                                 // try to stage Id (that is subordinate Stage?)
                                 stageAttemptId: Int,
                                 taskType: String,
                                 reason: TaskEndReason,
                                 // task information
                                 taskInfo: TaskInfo,
                                 // task index
                                 @Nullable taskMetrics: TaskMetrics)
  extends SparkListenerEvent

  

(3) a method in onTaskEnd members taskinfo information acquired by the taskMetrics

/ ** 
*. 1, taskMetrics
* 2, shuffle
*. 3, Task run (INPUT Output)
*. 4, taskInfo
** /
(. 4) may obtain monitoring information TaskMetrics
class TaskMetrics private[spark] () extends Serializable {
  // Each metric is internally represented as an accumulator
  private val _executorDeserializeTime = new LongAccumulator
  private val _executorDeserializeCpuTime = new LongAccumulator
  private val _executorRunTime = new LongAccumulator
  private val _executorCpuTime = new LongAccumulator
  private val _resultSize = new LongAccumulator
  private val _jvmGCTime = new LongAccumulator
  private val _resultSerializationTime = new LongAccumulator
  private val _memoryBytesSpilled = new LongAccumulator
  private val _diskBytesSpilled = new LongAccumulator
  private val _peakExecutionMemory = new LongAccumulator
  private val _updatedBlockStatuses = new CollectionAccumulator[(BlockId, BlockStatus)]
val inputMetrics: InputMetrics = new InputMetrics()

/**
 * Metrics related to writing data externally (e.g. to a distributed filesystem),
 * defined only in tasks with output.
 */
val outputMetrics: OutputMetrics = new OutputMetrics()

/**
 * Metrics related to shuffle read aggregated across all shuffle dependencies.
 * This is defined only if there are shuffle dependencies in this task.
 */
val shuffleReadMetrics: ShuffleReadMetrics = new ShuffleReadMetrics()

/**
 * Metrics related to shuffle write, defined only in shuffle map stages.
 */
val shuffleWriteMetrics: ShuffleWriteMetrics = new ShuffleWriteMetrics()

(5) implement the code and stored in redis

/**
 * Requirements 1. want to customize the operation of the spark job definitions stored in redis, integrated into their business back on display
 */
class MySparkAppListener extends SparkListener with Logging {

  val redisConf = "jedisConfig.properties"

  val jedis: Jedis = JedisUtil.getInstance().getJedis

  The first method // parent class
  override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = {
    // Information in onTaskEnd ways to get there
    /**
     * 1、taskMetrics
     * 2、shuffle
     * 3、task运行(input output)
     * 4、taskInfo
     **/

    val currentTimestamp = System.currentTimeMillis()
    // TaskMetrics (task index) can get metrics
    /**
     * private val _executorDeserializeTime = new LongAccumulator
     * private val _executorDeserializeCpuTime = new LongAccumulator
     * private val _executorRunTime = new LongAccumulator
     * private val _executorCpuTime = new LongAccumulator
     * private val _resultSize = new LongAccumulator
     * private val _jvmGCTime = new LongAccumulator
     * private val _resultSerializationTime = new LongAccumulator
     * private val _memoryBytesSpilled = new LongAccumulator
     * private val _diskBytesSpilled = new LongAccumulator
     * private val _peakExecutionMemory = new LongAccumulator
     * private val _updatedBlockStatuses = new CollectionAccumulator[(BlockId, BlockStatus)]
     */
    val metrics = taskEnd.taskMetrics
    val taskMetricsMap = scala.collection.mutable.HashMap(
      "ExecutorDeserializeTime" -> metrics.executorDeserializeTime, // deserialize executor of time
      "ExecutorDeserializeCpuTime" -> metrics.executorDeserializeCpuTime, // executor of de-serialization of cpu time
      "ExecutorRunTime" -> metrics.executorRunTime, // run time of executoor
      "ResultSize" -> metrics.resultSize, // the result set size
      "jvmGCTime" -> metrics.jvmGCTime, //
      "resultSerializationTime" -> metrics.resultSerializationTime,
      "MemoryBytesSpilled" -> metrics.memoryBytesSpilled, // size of memory overflow write
      "DiskBytesSpilled" -> metrics.diskBytesSpilled, // size of the disk is written to overflow
      Maximum Memory> metrics.peakExecutionMemory // executor of - "peakExecutionMemory"
    )

    val jedisKey = "taskMetrics_" + {
      currentTimestamp
    }
    jedis.set(jedisKey, Json(DefaultFormats).write(jedisKey))
    jedis.pexpire(jedisKey, 3600)


    // ====================== shuffle indicators ======================== ========
    val shuffleReadMetrics = metrics.shuffleReadMetrics
    val shuffleWriteMetrics = metrics.shuffleWriteMetrics

    Indicators // shuffleWriteMetrics shuffle process of reading these
    /**
     * private[executor] val _bytesWritten = new LongAccumulator
     * private[executor] val _recordsWritten = new LongAccumulator
     * private[executor] val _writeTime = new LongAccumulator
     */
    // shuffleReadMetrics shuffle the writing process of these indicators
    /**
     * private[executor] val _remoteBlocksFetched = new LongAccumulator
     * private[executor] val _localBlocksFetched = new LongAccumulator
     * private[executor] val _remoteBytesRead = new LongAccumulator
     * private[executor] val _localBytesRead = new LongAccumulator
     * private[executor] val _fetchWaitTime = new LongAccumulator
     * private[executor] val _recordsRead = new LongAccumulator
     */

    val shuffleMap = scala.collection.mutable.HashMap(
      "RemoteBlocksFetched" -> shuffleReadMetrics.remoteBlocksFetched, // shuffle remote pull data block
      "LocalBlocksFetched" -> shuffleReadMetrics.localBlocksFetched, // pull this land
      "RemoteBytesRead" -> number of bytes shuffleReadMetrics.remoteBytesRead, // shuffle remote reading
      "LocalBytesRead" -> shuffleReadMetrics.localBytesRead, // read the local data bytes
      "FetchWaitTime" -> shuffleReadMetrics.fetchWaitTime, // pull latency data
      The total number> record shuffleReadMetrics.recordsRead, // shuffle read - "recordsRead"
      "BytesWritten" -> shuffleWriteMetrics.bytesWritten, // shuffle write the total size
      "RecordsWritte" -> shuffleWriteMetrics.recordsWritten, // shuffle write the total number of records
      "writeTime" -> shuffleWriteMetrics.writeTime
    )

    val shuffleKey = s"shuffleKey${currentTimestamp}"
    jedis.set(shuffleKey, Json(DefaultFormats).write(shuffleMap))
    jedis.expire(shuffleKey, 3600)

    // input and output ================= ========================
    val inputMetrics = taskEnd.taskMetrics.inputMetrics
    val outputMetrics = taskEnd.taskMetrics.outputMetrics

    val input_output = scala.collection.mutable.HashMap(
      "BytesRead" -> inputMetrics.bytesRead, // read size
      "RecordsRead" -> inputMetrics.recordsRead, // total number of records
      "BytesWritten" -> outputMetrics.bytesWritten, // size of output
      Recording> outputMetrics.recordsWritten // output - "recordsWritten"
    )
    val input_outputKey = s"input_outputKey${currentTimestamp}"
    jedis.set(input_outputKey, Json(DefaultFormats).write(input_output))
    jedis.expire(input_outputKey, 3600)



    //####################taskInfo#######
    val taskInfo: TaskInfo = taskEnd.taskInfo

    val taskInfoMap = scala.collection.mutable.HashMap(
      "taskId" -> taskInfo.taskId ,
      "host" -> taskInfo.host ,
      "Speculative" -> taskInfo.speculative, // speculative execution
      "failed" -> taskInfo.failed ,
      "killed" -> taskInfo.killed ,
      "running" -> taskInfo.running

    )

    val taskInfoKey = s"taskInfo${currentTimestamp}"
    jedis.set(taskInfoKey , Json(DefaultFormats).write(taskInfoMap))
    jedis.expire(taskInfoKey , 3600)

  }

(5) test procedure

  sparkContext.addSparkListener way to add your own monitor the main class

sc.addSparkListener (new MySparkAppListener ()) 

using a simple test wordcount

 

 

 

Two, spark real-time monitoring

1.StreamingListener is real-time monitoring interface, which has successfully received the data, error, stop, batch submission, start to complete the other indicators, the principle the same as above

{related StreamingListener

  /** Called when a receiver has been started */
  def onReceiverStarted(receiverStarted: StreamingListenerReceiverStarted) { }

  /** Called when a receiver has reported an error */
  def onReceiverError(receiverError: StreamingListenerReceiverError) { }

  /** Called when a receiver has been stopped */
  def onReceiverStopped(receiverStopped: StreamingListenerReceiverStopped) { }

  /** Called when a batch of jobs has been submitted for processing. */
  def onBatchSubmitted(batchSubmitted: StreamingListenerBatchSubmitted) { }

  /** Called when processing of a batch of jobs has started.  */
  def onBatchStarted(batchStarted: StreamingListenerBatchStarted) { }

  /** Called when processing of a batch of jobs has completed. */
  def onBatchCompleted(batchCompleted: StreamingListenerBatchCompleted) { }

  /** Called when processing of a job of a batch has started. */
  def onOutputOperationStarted(
      outputOperationStarted: StreamingListenerOutputOperationStarted) { }

  /** Called when processing of a job of a batch has completed. */
  def onOutputOperationCompleted(
      outputOperationCompleted: StreamingListenerOutputOperationCompleted) { }
}

2. key indicators and purposes

1.onReceiverError 

Monitoring data reception error information, e-mail alerts

This method is called when the batch is completed 2.onBatchCompleted

(1) sparkstreaming offset submitted, when executing the batch changes only to save warehousing offset, (which can not be guaranteed after the completion count storage program interruption, offset uncommitted) 
(2) is longer than a batch process the prescribed time window, the program appears blocked mail alerts

Three, spark, yarn return to the web interface for data parsing, obtaining index information 

1. Start a local spark program
Visit: http: // localhost: 4040 / metrics / json /, get a bunch of json data, analytical gauges, you can get all the information
{
    "version": "3.0.0", 
    "gauges": {
        "local-1581865176069.driver.BlockManager.disk.diskSpaceUsed_MB": {
            "value": 0
        }, 
        "local-1581865176069.driver.BlockManager.memory.maxMem_MB": {
            "value": 1989
        }, 
        "local-1581865176069.driver.BlockManager.memory.memUsed_MB": {
            "value": 0
        }, 
        "local-1581865176069.driver.BlockManager.memory.remainingMem_MB": {
            "value": 1989
        }, 
        "local-1581865176069.driver.DAGScheduler.job.activeJobs": {
            "value": 0
        }, 
        "local-1581865176069.driver.DAGScheduler.job.allJobs": {
            "value": 0
        }, 
        "local-1581865176069.driver.DAGScheduler.stage.failedStages": {
            "value": 0
        }, 
        "local-1581865176069.driver.DAGScheduler.stage.runningStages": {
            "value": 0
        }, 
        "local-1581865176069.driver.DAGScheduler.stage.waitingStages": {
            "value": 0
        }
    }, 
    "counters": {
        "local-1581865176069.driver.HiveExternalCatalog.fileCacheHits": {
            "count": 0
        }, 
        "local-1581865176069.driver.HiveExternalCatalog.filesDiscovered": {
            "count": 0
        }, 
        "local-1581865176069.driver.HiveExternalCatalog.hiveClientCalls": {
            "count": 0
        }, 
        "local-1581865176069.driver.HiveExternalCatalog.parallelListingJobCount": {
            "count": 0
        }, 
        "local-1581865176069.driver.HiveExternalCatalog.partitionsFetched": {
            "count": 0
        }
    }, 
    "histograms": {
        "local-1581865176069.driver.CodeGenerator.compilationTime": {
            "count": 0, 
            "max": 0, 
            "mean": 0, 
            "min": 0, 
            "p50": 0, 
            "p75": 0, 
            "p95": 0, 
            "p98": 0, 
            "p99": 0, 
            "p999": 0, 
            "stddev": 0
        }, 
        "local-1581865176069.driver.CodeGenerator.generatedClassSize": {
            "count": 0, 
            "max": 0, 
            "mean": 0, 
            "min": 0, 
            "p50": 0, 
            "p75": 0, 
            "p95": 0, 
            "p98": 0, 
            "p99": 0, 
            "p999": 0, 
            "stddev": 0
        }, 
        "local-1581865176069.driver.CodeGenerator.generatedMethodSize": {
            "count": 0, 
            "max": 0, 
            "mean": 0, 
            "min": 0, 
            "p50": 0, 
            "p75": 0, 
            "p95": 0, 
            "p98": 0, 
            "p99": 0, 
            "p999": 0, 
            "stddev": 0
        }, 
        "local-1581865176069.driver.CodeGenerator.sourceCodeSize": {
            "count": 0, 
            "max": 0, 
            "mean": 0, 
            "min": 0, 
            "p50": 0, 
            "p75": 0, 
            "p95": 0, 
            "p98": 0, 
            "p99": 0, 
            "p999": 0, 
            "stddev": 0
        }
    }, 
    "meters": { }, 
    "timers": {
        "local-1581865176069.driver.DAGScheduler.messageProcessingTime": {
            "count": 0, 
            "max": 0, 
            "mean": 0, 
            "min": 0, 
            "p50": 0, 
            "p75": 0, 
            "p95": 0, 
            "p98": 0, 
            "p99": 0, 
            "p999": 0, 
            "stddev": 0, 
            "m15_rate": 0, 
            "m1_rate": 0, 
            "m5_rate": 0, 
            "mean_rate": 0, 
            "duration_units": "milliseconds", 
            "rate_units": "calls/second"
        }
    }
}

 



2.spark submitted to the yarn
   val sparkDriverHost = sc.getConf.get("spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES")
    // Page path monitoring information for the cluster path + / proxy / + application id + / metrics / json
  val url = s"${sparkDriverHost}/metrics/json"

 3. Role

1. The job (endTime, applicationUniqueName, applicationId, sourceCount, costTime, countPerMillis) can do the table, do statistical link

2. Disk and memory information to do a pie chart for monitoring memory and disk

3. The task of running the program do form in order to monitor the job 

 val fieldMap = scala.collection.mutable.Map (
      // TODO ================= form, link statistics do ======================== =========
      "applicationId" -> monitorIndex._3.toString,
      "endTime" -> new DateTime(monitorIndex._1).toDateTime.toString("yyyy-MM-dd HH:mm:ss"),
      "applicationUniqueName" -> monitorIndex._2.toString,
      "SourceCount" -> monitorIndex._4.toString, // handle the current multiple data
      "CostTime" - Time> monitorIndex._5.toString, // spent
      "countPerMillis" -> monitorIndex._6.toString,
      "serversCountMap" -> serversCountMap ,
      // TODO ================= do pie charts, to monitor memory and disk ================== ===============
      "DiskSpaceUsed_MB" -> diskSpaceUsed_MB, // use disk space
      "MaxMem_MB" -> maxMem_MB, // maximum memory
      "MemUsed_MB" -> memUsed_MB, // used in the inch
      "RemainingMem_MB" -> remainingMem_MB, // free memory
      // TODO ================= do form in order to monitor ===================== on the job ============
      "ActiveJobs" -> activeJobs, // job currently running
      "AllJobs" -> allJobs, // all of the job
      "FailedStages" -> failedStages, // if an error occurs in stage
      "RunningStages" -> runningStages, // running stage
      "WaitingStages" -> waitingStages, // is in the stage of waiting to run
      "LastCompletedBatch_processingDelay" -> lastCompletedBatch_processingDelay, // what the recent batch of delay time
      "LastCompletedBatch_processingTime" -> lastCompletedBatch_processingTime, // we are dealing with lots of time
      "LastReceivedBatch_records" -> lastReceivedBatch_records, // the amount of data recently received
      "RunningBatches" -> runningBatches, // running batch
      "TotalCompletedBatches" -> totalCompletedBatches, // all completed batches
      "TotalProcessedRecords" -> totalProcessedRecords, // total number of data processing
      "TotalReceivedRecords" -> totalReceivedRecords, // total receiving data
      "UnprocessedBatches" -> unprocessedBatches, // untreated batch
      "WaitingBatches" -> waitingBatches // waiting in batches
    )

  

















 

Guess you like

Origin www.cnblogs.com/hejunhong/p/12318921.html