Proceso de ejecución de Spark on Yarn y análisis de registros.

Enviar comando

${SPARK_HOME}/bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode cluster \
    --driver-memory 4g \
    --executor-memory 1g \
    --executor-cores 4 \
    --queue default \
    ${SPARK_HOME}/examples/jars/spark-examples*.jar \
    10

Proceso de implementación

  1. El cliente ejecuta Spark-Submit para enviar la solicitud, se registra en ResourceManager y solicita recursos.

  2. Después de que ResourceManger recibe la solicitud, selecciona un nodeManager en el clúster, asigna el primer contenedor a la aplicación y crea la aplicación maestra en él. Hay un controlador en la aplicación maestra y comienza a ejecutarlo (en realidad analiza el programa escrito por el usuario)

  3. conductor :

    (1) El controlador ejecutará el método principal de la aplicación.

    (2) El objeto sparkContext se construye en el método principal. Este objeto es muy importante. Es la entrada a todos los programas Spark. Dentro del objeto sparkContext, también se construyen dos objetos DAGScheduler y TaskScheduler.

    (3) El programa implica una gran cantidad de operaciones de conversión RDD y, finalmente, una acción determinada desencadena la ejecución real. En este momento, se generará un gráfico acíclico dirigido por DAG en función de la relación de rdd en el código. La dirección del gráfico es el orden de las operaciones del operador de rdd y, finalmente, el gráfico acíclico dirigido se envía al objeto DAGScheduler.

    (4) Después de que DAGScheduler obtiene el gráfico acíclico dirigido, divide muchas etapas según dependencias amplias. Cada etapa tiene muchas tareas que se pueden ejecutar en paralelo y las divide en una colección de conjuntos de tareas. Finalmente, divide cada etapa una por una. La colección taskSet se envía al objeto TaskScheduler.

    (5) Después de que TaskScheduler recibe muchos conjuntos de tareas, ejecuta las tareas que contiene de acuerdo con las dependencias de la etapa. Al ejecutar cada conjunto de tareas, TaskSchduler atraviesa el conjunto de tareas y envía cada tarea al ejecutor para su ejecución por turno.

    El conductor solo desmonta las tareas y la ejecución real se realiza en el contenedor de hilo.

  4. El maestro de la aplicación se registra en el ResourceManager, de modo que el estado de ejecución de la tarea se puede ver a través de RM. Al mismo tiempo, AM solicita recursos para cada tarea y monitorea la finalización de la ejecución de la tarea.

  5. Después de que AM solicite el recurso (contenedor), se comunicará con NM y permitirá que NM inicie CoarseGrainedExecutorBackend en el contenedor obtenido. Cuando se inicie CoarseGrainedExecutorBackend, se registrará con sparkContext en AM y solicitará una tarea.

  6. SparkContext en AM asigna la tarea a CoarseGrainedExecutorBackend. Al ejecutar la tarea, CoarseGrainedExecutorBackend informa el progreso y el estado de la tarea a AM, para que AM pueda realizar un seguimiento de la ejecución de la tarea en cualquier momento, de modo que pueda realizar una segunda intentarlo cuando falla la ejecución de la tarea o cuando los recursos del clúster son escasos. Cuando se finaliza la tarea.

  7. Cuando se completa la tarea, AM envía una solicitud a RM para cerrar sesión.


registro de ejecución

22/11/19 17:42:18 WARN util.Utils: Your hostname, macdeMacBook-Pro-3.local resolves to a loopback address: 127.0.0.1; using 10.10.9.250 instead (on interface en0)
22/11/19 17:42:18 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
22/11/19 17:42:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/11/19 17:42:19 INFO client.RMProxy: Connecting to ResourceManager at sh01/172.16.99.214:8010
22/11/19 17:42:19 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
22/11/19 17:42:19 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
22/11/19 17:42:19 INFO yarn.Client: Will allocate AM container, with 4505 MB memory including 409 MB overhead
22/11/19 17:42:19 INFO yarn.Client: Setting up container launch context for our AM
22/11/19 17:42:19 INFO yarn.Client: Setting up the launch environment for our AM container
22/11/19 17:42:19 INFO yarn.Client: Preparing resources for our AM container
22/11/19 17:42:20 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
22/11/19 17:42:23 INFO yarn.Client: Uploading resource file:/usr/local/spark-2.4.8-bin-hadoop2.7/tmp/spark-b423d166-c45e-429a-b25a-3efde9c1145c/__spark_libs__2899998199838240455.zip -> hdfs://sh01:9000/user/mac/.sparkStaging/application_1666603193487_2205/__spark_libs__2899998199838240455.zip
22/11/19 17:45:52 INFO yarn.Client: Uploading resource file:/usr/local/spark/examples/jars/spark-examples_2.11-2.4.8.jar -> hdfs://sh01:9000/user/mac/.sparkStaging/application_1666603193487_2205/spark-examples_2.11-2.4.8.jar
22/11/19 17:45:54 INFO yarn.Client: Uploading resource file:/usr/local/spark-2.4.8-bin-hadoop2.7/tmp/spark-b423d166-c45e-429a-b25a-3efde9c1145c/__spark_conf__8349177025085739013.zip -> hdfs://sh01:9000/user/mac/.sparkStaging/application_1666603193487_2205/__spark_conf__.zip
22/11/19 17:45:56 INFO spark.SecurityManager: Changing view acls to: mac
22/11/19 17:45:56 INFO spark.SecurityManager: Changing modify acls to: mac
22/11/19 17:45:56 INFO spark.SecurityManager: Changing view acls groups to:
22/11/19 17:45:56 INFO spark.SecurityManager: Changing modify acls groups to:
22/11/19 17:45:56 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(mac); groups with view permissions: Set(); users  with modify permissions: Set(mac); groups with modify permissions: Set()
22/11/19 17:45:57 INFO yarn.Client: Submitting application application_1666603193487_2205 to ResourceManager
22/11/19 17:45:57 INFO impl.YarnClientImpl: Submitted application application_1666603193487_2205
22/11/19 17:45:58 INFO yarn.Client: Application report for application_1666603193487_2205 (state: ACCEPTED)
22/11/19 17:45:58 INFO yarn.Client:
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1668851157430
	 final status: UNDEFINED
	 tracking URL: http://sh01:8012/proxy/application_1666603193487_2205/
	 user: mac
22/11/19 17:45:59 INFO yarn.Client: Application report for application_1666603193487_2205 (state: ACCEPTED)
22/11/19 17:46:00 INFO yarn.Client: Application report for application_1666603193487_2205 (state: ACCEPTED)
22/11/19 17:46:01 INFO yarn.Client: Application report for application_1666603193487_2205 (state: ACCEPTED)
22/11/19 17:46:02 INFO yarn.Client: Application report for application_1666603193487_2205 (state: ACCEPTED)
22/11/19 17:46:03 INFO yarn.Client: Application report for application_1666603193487_2205 (state: RUNNING)
22/11/19 17:46:03 INFO yarn.Client:
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: sh02
	 ApplicationMaster RPC port: 46195
	 queue: default
	 start time: 1668851157430
	 final status: UNDEFINED
	 tracking URL: http://sh01:8012/proxy/application_1666603193487_2205/
	 user: mac 
22/11/19 17:46:04 INFO yarn.Client: Application report for application_1666603193487_2205 (state: RUNNING)
22/11/19 17:46:05 INFO yarn.Client: Application report for application_1666603193487_2205 (state: RUNNING)
22/11/19 17:46:06 INFO yarn.Client: Application report for application_1666603193487_2205 (state: RUNNING)
22/11/19 17:46:07 INFO yarn.Client: Application report for application_1666603193487_2205 (state: RUNNING)
22/11/19 17:46:08 INFO yarn.Client: Application report for application_1666603193487_2205 (state: RUNNING)
22/11/19 17:46:09 INFO yarn.Client: Application report for application_1666603193487_2205 (state: RUNNING)
22/11/19 17:46:10 INFO yarn.Client: Application report for application_1666603193487_2205 (state: RUNNING)
22/11/19 17:46:11 INFO yarn.Client: Application report for application_1666603193487_2205 (state: FINISHED)
22/11/19 17:46:11 INFO yarn.Client:
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: sh02
	 ApplicationMaster RPC port: 46195
	 queue: default
	 start time: 1668851157430
	 final status: SUCCEEDED
	 tracking URL: http://sh01:8012/proxy/application_1666603193487_2205/
	 user: mac
22/11/19 17:46:12 INFO yarn.Client: Deleted staging directory hdfs://sh01:9000/user/mac/.sparkStaging/application_1666603193487_2205
22/11/19 17:46:12 INFO util.ShutdownHookManager: Shutdown hook called
22/11/19 17:46:12 INFO util.ShutdownHookManager: Deleting directory /private/var/folders/pc/mj2v_vln4x14q6jylbtnmvx40000gn/T/spark-b39d7673-82ac-471c-8f8a-f667b8b081f2
22/11/19 17:46:12 INFO util.ShutdownHookManager: Deleting directory /usr/local/spark-2.4.8-bin-hadoop2.7/tmp/spark-b423d166-c45e-429a-b25a-3efde9c1145c

4-6: Conéctese a ResourceManager, solicite una nueva aplicación en un clúster compuesto por dos NodeManagers y verifique que los recursos de memoria solicitados por la nueva aplicación no excedan los recursos de memoria máximos del clúster. Cada contenedor del clúster contiene alrededor de 8G de memoria.

8-14: Asigne un contenedor de tamaño 4505 M al maestro de la aplicación. ¿Qué significa "incluidos 409 MB de gastos generales"? Como se mencionó anteriormente, AM contiene un controlador. Cuando enviamos la tarea, solicitamos 4G (4096 MB) de memoria para el controlador, 4505 - 4096 = 409. RM asignó más memoria según la aplicación. En cuanto a por qué fue Asignamos más memoria, primero detengámonos aquí sin entrar en detalles. El siguiente paso es crear un entorno para el contenedor AM, preparar recursos, empaquetar las bibliotecas locales dependientes de Spark (verifiqué que son 244 M), el paquete jar de la aplicación y el archivo de configuración de Spark y cargarlos en el directorio HDFS. esperando a que se ejecute la aplicación hdfs://sh01:9000/user/mac/.sparkStaging/application_1666603193487_2205. Este directorio se eliminará una vez completado.

15-19: Verificación de seguridad.

20-21: Envíe la solicitud a RM. Tenga en cuenta aquí que el nombre de la aplicación es: application_1666603193487_2205, que es el mismo que el nombre del directorio al cargar recursos en HDFS. Supongo que debería ser la solicitud presentada por AM.

22-36: AM solicita un contenedor (recurso) de RM para ejecutar la tarea, por lo que la marca de estado es ACEPTADA. ¿Por qué dices eso? Porque a veces se descubre que ACEPTADO durará mucho tiempo y hay tareas ejecutándose en el clúster en ese momento y no hay recursos adicionales, por lo que se infiere que AM está solicitando recursos para la tarea en este momento.

37-55: La tarea comienza a ejecutarse, desde la línea 41, puede ver que el contenedor AM está asignado a la máquina sh02. (sh01:RM, sh02:NM, sh03:NM)

56-69: Se completa la tarea, se elimina el caché en HDFS y se elimina el caché local. Los nombres del directorio son correctos.


¿A qué se refieren cliente y conductor?

  • Cliente: el lugar donde se ejecuta el comando Spark-Submit se llama cliente.

  • controlador: el programa enviado por el usuario se ejecuta como controlador.

¿Donde esta el conductor? Primero imagina algunos servidores.

服务器  角色
sh01  resourceManger
sh02  nodeManger
sh03  nodeManger
sh04  拥有大数据集群的配置

Envíe la tarea en sh04

  • Modo hilo-cluster: Es el escenario descrito anteriormente, el controlador no está en el cliente sino en el AM de sh02. Durante el proceso de ejecución de la tarea, la transmisión de información entre el contenedor de tareas y AM (el controlador en AM), y la transmisión de información entre AM y RM no tienen nada que ver con el cliente. El cliente solo recibe los datos enviados desde stdout. Incluso si el cliente ya no está, la tarea aún se puede ejecutar.

  • Modo hilo-cliente: el controlador existe en el cliente y las tareas no se pueden ejecutar sin el cliente. (En cuanto a otra información, el registro está al final, puedes analizarlo tú mismo)

En el desarrollo real, sh01, sh02 y sh03 suelen ser grupos de big data, y sh04 probablemente sea solo un programa para tareas de envío de segmentos.


¿Cuál es la relación entre el contenedor y el ejecutor del hilo?

En el clúster de hilo, tanto el ejecutor como el maestro de la aplicación deben ejecutarse en un "contenedor". El contenedor aquí no se refiere a Docker. Representa los recursos de almacenamiento y los recursos informáticos en la máquina física. Estos recursos son supervisados ​​por NM y programados por RM. La asignación de recursos del grupo de hilos está en la unidad de contenedor. Tanto el ejecutor como el maestro de la aplicación son procesos y solo se pueden ejecutar después de que se les asignen recursos.


registro de cliente de hilo

${SPARK_HOME}/bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode client \
    --driver-memory 4g \
    --executor-memory 1g \
    --executor-cores 4 \
    --queue default \
    ${SPARK_HOME}/examples/jars/spark-examples*.jar \
    10
22/11/19 18:33:36 WARN util.Utils: Your hostname, macdeMacBook-Pro-3.local resolves to a loopback address: 127.0.0.1; using 10.10.9.250 instead (on interface en0)
22/11/19 18:33:36 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
22/11/19 18:33:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/11/19 18:33:36 INFO spark.SparkContext: Running Spark version 2.4.8
22/11/19 18:33:36 INFO spark.SparkContext: Submitted application: Spark Pi
22/11/19 18:33:36 INFO spark.SecurityManager: Changing view acls to: mac
22/11/19 18:33:36 INFO spark.SecurityManager: Changing modify acls to: mac
22/11/19 18:33:36 INFO spark.SecurityManager: Changing view acls groups to:
22/11/19 18:33:36 INFO spark.SecurityManager: Changing modify acls groups to:
22/11/19 18:33:36 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(mac); groups with view permissions: Set(); users  with modify permissions: Set(mac); groups with modify permissions: Set()
22/11/19 18:33:37 INFO util.Utils: Successfully started service 'sparkDriver' on port 53336.
22/11/19 18:33:37 INFO spark.SparkEnv: Registering MapOutputTracker
22/11/19 18:33:37 INFO spark.SparkEnv: Registering BlockManagerMaster
22/11/19 18:33:37 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/11/19 18:33:37 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/11/19 18:33:37 INFO storage.DiskBlockManager: Created local directory at /usr/local/spark-2.4.8-bin-hadoop2.7/tmp/blockmgr-ea23e012-50a5-4ad2-a2c0-cf40ea020a9e
22/11/19 18:33:37 INFO memory.MemoryStore: MemoryStore started with capacity 2004.6 MB
22/11/19 18:33:37 INFO spark.SparkEnv: Registering OutputCommitCoordinator
22/11/19 18:33:37 INFO util.log: Logging initialized @2435ms to org.spark_project.jetty.util.log.Slf4jLog
22/11/19 18:33:37 INFO server.Server: jetty-9.4.z-SNAPSHOT; built: unknown; git: unknown; jvm 1.8.0_333-b02
22/11/19 18:33:37 INFO server.Server: Started @2564ms
22/11/19 18:33:37 INFO server.AbstractConnector: Started ServerConnector@62b3df3a{
    
    HTTP/1.1, (http/1.1)}{
    
    0.0.0.0:4040}
22/11/19 18:33:37 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@169da7f2{
    
    /jobs,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@757f675c{
    
    /jobs/json,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2617f816{
    
    /jobs/job,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5d10455d{
    
    /jobs/job/json,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@535b8c24{
    
    /stages,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4a951911{
    
    /stages/json,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@55b62629{
    
    /stages/stage,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6759f091{
    
    /stages/stage/json,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@33a053d{
    
    /stages/pool,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@14a54ef6{
    
    /stages/pool/json,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@20921b9b{
    
    /storage,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@867ba60{
    
    /storage/json,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5ba745bc{
    
    /storage/rdd,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@654b72c0{
    
    /storage/rdd/json,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@55b5e331{
    
    /environment,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6034e75d{
    
    /environment/json,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@15fc442{
    
    /executors,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3f3c7bdb{
    
    /executors/json,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@456abb66{
    
    /executors/threadDump,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2a3a299{
    
    /executors/threadDump/json,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7da10b5b{
    
    /static,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1da6ee17{
    
    /,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@78d39a69{
    
    /api,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@15f193b8{
    
    /jobs/job/kill,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2516fc68{
    
    /stages/stage/kill,null,AVAILABLE,@Spark}
22/11/19 18:33:37 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.10.9.250:4040
22/11/19 18:33:37 INFO spark.SparkContext: Added JAR file:/usr/local/spark/examples/jars/spark-examples_2.11-2.4.8.jar at spark://10.10.9.250:53336/jars/spark-examples_2.11-2.4.8.jar with timestamp 1668854017716
22/11/19 18:33:38 INFO client.RMProxy: Connecting to ResourceManager at sh01/172.16.99.214:8010
22/11/19 18:33:38 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
22/11/19 18:33:38 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
22/11/19 18:33:38 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
22/11/19 18:33:38 INFO yarn.Client: Setting up container launch context for our AM
22/11/19 18:33:38 INFO yarn.Client: Setting up the launch environment for our AM container
22/11/19 18:33:38 INFO yarn.Client: Preparing resources for our AM container
22/11/19 18:33:39 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
22/11/19 18:33:42 INFO yarn.Client: Uploading resource file:/usr/local/spark-2.4.8-bin-hadoop2.7/tmp/spark-7ecf7a1c-87e6-4f76-8e50-cd1682762c25/__spark_libs__7614795133133378512.zip -> hdfs://sh01:9000/user/mac/.sparkStaging/application_1666603193487_2206/__spark_libs__7614795133133378512.zip
22/11/19 18:37:46 INFO yarn.Client: Uploading resource file:/usr/local/spark-2.4.8-bin-hadoop2.7/tmp/spark-7ecf7a1c-87e6-4f76-8e50-cd1682762c25/__spark_conf__885526568489264491.zip -> hdfs://sh01:9000/user/mac/.sparkStaging/application_1666603193487_2206/__spark_conf__.zip
22/11/19 18:37:48 INFO spark.SecurityManager: Changing view acls to: mac
22/11/19 18:37:48 INFO spark.SecurityManager: Changing modify acls to: mac
22/11/19 18:37:48 INFO spark.SecurityManager: Changing view acls groups to:
22/11/19 18:37:48 INFO spark.SecurityManager: Changing modify acls groups to:
22/11/19 18:37:48 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(mac); groups with view permissions: Set(); users  with modify permissions: Set(mac); groups with modify permissions: Set()
22/11/19 18:37:49 INFO yarn.Client: Submitting application application_1666603193487_2206 to ResourceManager
22/11/19 18:37:50 INFO impl.YarnClientImpl: Submitted application application_1666603193487_2206
22/11/19 18:37:50 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1666603193487_2206 and attemptId None
22/11/19 18:37:51 INFO yarn.Client: Application report for application_1666603193487_2206 (state: ACCEPTED)
22/11/19 18:37:51 INFO yarn.Client:
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1668854270205
	 final status: UNDEFINED
	 tracking URL: http://sh01:8012/proxy/application_1666603193487_2206/
	 user: mac
22/11/19 18:37:52 INFO yarn.Client: Application report for application_1666603193487_2206 (state: ACCEPTED)
22/11/19 18:37:53 INFO yarn.Client: Application report for application_1666603193487_2206 (state: ACCEPTED)
22/11/19 18:37:54 INFO yarn.Client: Application report for application_1666603193487_2206 (state: ACCEPTED)
22/11/19 18:37:55 INFO yarn.Client: Application report for application_1666603193487_2206 (state: ACCEPTED)
22/11/19 18:37:55 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> sh01, PROXY_URI_BASES -> http://sh01:8012/proxy/application_1666603193487_2206), /proxy/application_1666603193487_2206
22/11/19 18:37:55 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
22/11/19 18:37:56 INFO yarn.Client: Application report for application_1666603193487_2206 (state: RUNNING)
22/11/19 18:37:56 INFO yarn.Client:
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: 172.16.99.116
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1668854270205
	 final status: UNDEFINED
	 tracking URL: http://sh01:8012/proxy/application_1666603193487_2206/
	 user: mac
22/11/19 18:37:56 INFO cluster.YarnClientSchedulerBackend: Application application_1666603193487_2206 has started running.
22/11/19 18:37:56 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 54084.
22/11/19 18:37:56 INFO netty.NettyBlockTransferService: Server created on 10.10.9.250:54084
22/11/19 18:37:56 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
22/11/19 18:37:56 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.10.9.250, 54084, None)
22/11/19 18:37:56 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.10.9.250:54084 with 2004.6 MB RAM, BlockManagerId(driver, 10.10.9.250, 54084, None)
22/11/19 18:37:56 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.10.9.250, 54084, None)
22/11/19 18:37:56 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.10.9.250, 54084, None)
22/11/19 18:37:56 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /metrics/json.
22/11/19 18:37:56 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@238291d4{
    
    /metrics/json,null,AVAILABLE,@Spark}
22/11/19 18:37:56 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
22/11/19 18:37:57 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:38
22/11/19 18:37:57 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions
22/11/19 18:37:57 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
22/11/19 18:37:57 INFO scheduler.DAGScheduler: Parents of final stage: List()
22/11/19 18:37:57 INFO scheduler.DAGScheduler: Missing parents: List()
22/11/19 18:37:57 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
22/11/19 18:37:57 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.0 KB, free 2004.6 MB)
22/11/19 18:37:58 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1358.0 B, free 2004.6 MB)
22/11/19 18:37:58 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.10.9.250:54084 (size: 1358.0 B, free: 2004.6 MB)
22/11/19 18:37:58 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1184
22/11/19 18:37:58 INFO scheduler.DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9))
22/11/19 18:37:58 INFO cluster.YarnScheduler: Adding task set 0.0 with 10 tasks
22/11/19 18:37:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.16.99.116:48068) with ID 2
22/11/19 18:37:59 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, sh02, executor 2, partition 0, PROCESS_LOCAL, 7741 bytes)
22/11/19 18:37:59 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, sh02, executor 2, partition 1, PROCESS_LOCAL, 7743 bytes)
22/11/19 18:37:59 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, sh02, executor 2, partition 2, PROCESS_LOCAL, 7743 bytes)
22/11/19 18:37:59 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, sh02, executor 2, partition 3, PROCESS_LOCAL, 7743 bytes)
22/11/19 18:38:00 INFO storage.BlockManagerMasterEndpoint: Registering block manager sh02:44398 with 366.3 MB RAM, BlockManagerId(2, sh02, 44398, None)
22/11/19 18:38:02 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on sh02:44398 (size: 1358.0 B, free: 366.3 MB)
22/11/19 18:38:02 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.16.97.106:57790) with ID 1
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, sh03, executor 1, partition 4, PROCESS_LOCAL, 7743 bytes)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, sh03, executor 1, partition 5, PROCESS_LOCAL, 7743 bytes)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, sh03, executor 1, partition 6, PROCESS_LOCAL, 7743 bytes)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, sh03, executor 1, partition 7, PROCESS_LOCAL, 7743 bytes)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, sh02, executor 2, partition 8, PROCESS_LOCAL, 7743 bytes)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, sh02, executor 2, partition 9, PROCESS_LOCAL, 7743 bytes)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 2609 ms on sh02 (executor 2) (1/10)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 2608 ms on sh02 (executor 2) (2/10)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 2622 ms on sh02 (executor 2) (3/10)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2645 ms on sh02 (executor 2) (4/10)
22/11/19 18:38:02 INFO storage.BlockManagerMasterEndpoint: Registering block manager sh03:45892 with 366.3 MB RAM, BlockManagerId(1, sh03, 45892, None)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 378 ms on sh02 (executor 2) (5/10)
22/11/19 18:38:02 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 407 ms on sh02 (executor 2) (6/10)
22/11/19 18:38:04 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on sh03:45892 (size: 1358.0 B, free: 366.3 MB)
22/11/19 18:38:05 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 2762 ms on sh03 (executor 1) (7/10)
22/11/19 18:38:05 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 2787 ms on sh03 (executor 1) (8/10)
22/11/19 18:38:05 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 2794 ms on sh03 (executor 1) (9/10)
22/11/19 18:38:05 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 2800 ms on sh03 (executor 1) (10/10)
22/11/19 18:38:05 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool
22/11/19 18:38:05 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 8.174 s
22/11/19 18:38:05 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 8.233929 s
Pi is roughly 3.1405671405671405
22/11/19 18:38:05 INFO server.AbstractConnector: Stopped Spark@62b3df3a{
    
    HTTP/1.1, (http/1.1)}{
    
    0.0.0.0:4040}
22/11/19 18:38:05 INFO ui.SparkUI: Stopped Spark web UI at http://10.10.9.250:4040
22/11/19 18:38:05 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
22/11/19 18:38:05 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
22/11/19 18:38:05 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
22/11/19 18:38:05 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
22/11/19 18:38:05 INFO cluster.YarnClientSchedulerBackend: Stopped
22/11/19 18:38:05 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/11/19 18:38:05 INFO memory.MemoryStore: MemoryStore cleared
22/11/19 18:38:05 INFO storage.BlockManager: BlockManager stopped
22/11/19 18:38:05 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
22/11/19 18:38:05 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/11/19 18:38:05 INFO spark.SparkContext: Successfully stopped SparkContext
22/11/19 18:38:05 INFO util.ShutdownHookManager: Shutdown hook called
22/11/19 18:38:05 INFO util.ShutdownHookManager: Deleting directory /private/var/folders/pc/mj2v_vln4x14q6jylbtnmvx40000gn/T/spark-5ece9ef1-aff6-451e-bf36-b637d4afb74d
22/11/19 18:38:05 INFO util.ShutdownHookManager: Deleting directory /usr/local/spark-2.4.8-bin-hadoop2.7/tmp/spark-7ecf7a1c-87e6-4f76-8e50-cd1682762c25

Supongo que te gusta

Origin blog.csdn.net/yy_diego/article/details/127953198
Recomendado
Clasificación