guide
JvisulaVM Windows environment typically present in JDK installed directory $ {JAVA_HOME} /bin/JvisualVM.exe, which supports (local and remote) and jstatd two ways to connect remote JMX JVM.
jstatd (the Java Virtual Machine jstat Daemon) - monitor remote server CPU, memory, threads, and other information
JMX (Java Management Extensions, the Java Management Extensions) is a framework for applications, devices, systems, and other management functions of the implant. JMX across a range of heterogeneous operating system platform, system architecture and network transport protocols, flexible development seamlessly integrated system, network and service management applications.
Note: For jstatd I try not successful, so do not mislead people here.
JMX monitoring
Normal configuration:
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Djava.rmi.server.hostname=<ip> -Dcom.sun.management.jmxremote.port=<port>
Adding JMX configuration:
In Spark when monitoring executor, you need to configure and start the jmx spark applications, configure three ways:
1) arranged in the three parameters in the spark-defaults.conf
2) In the spark-env.sh: Configure JavaOptions master, worker's
3) arranged at the spark-submit Submit
When using this configuration the spark-submit Submit:
spark-submit \ --class myTest.KafkaWordCount \ --master yarn \ --deploy-mode cluster \
--conf "spark.executor.extraJavaOptions=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=0 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false" \ --verbose \ --executor-memory 1G \ --total-executor-cores 6 \ /hadoop/spark/app/spark/20151223/testSpark.jar *.*.*.*:* test3 wordcount 4 kafkawordcount3 checkpoint4
note:
1) You can not specify a specific ip and port ------ because when you run spark, is likely to assign multiple container processes on a node, this time occupying the same port, will lead to spark the failure to submit an application by spark-submit .
2) because it does not specify a specific ip and port, so the stage will be submitted to the task automatically assigned ports.
3) the top three configuration methods may lead to different levels of monitoring (such as spark-submit only one application for the program, spark-env.sh may be a node All global monitoring executor [unverified], reader's attention.)
Find JMX port allocation
By yarn applicationattempt -list appicationId find applicationattemptid
[root@cdh-143 bin]# yarn applicationattempt -list application_1559203334026_0015 19/06/01 17:57:18 INFO client.RMProxy: Connecting to ResourceManager at CDH-143/10.dx.dx.143:8032 Total number of application attempts :1 ApplicationAttempt-Id State AM-Container-Id Tracking-URL appattempt_1559203334026_0015_000001 RUNNING container_1559203334026_0015_01_000001 http://CDH-143:8088/proxy/application_1559203334026_0015/
By yarn container -list aaplicationattemptId find container id list
[root@cdh-143 bin]# yarn container -list appattempt_1559203334026_0015_000001 19/06/01 17:57:52 INFO client.RMProxy: Connecting to ResourceManager at CDH-143/10.dx.dx.143:8032 Total number of containers :16 Container-Id Start Time Finish Time State Host LOG-URL container_1559203334026_0015_01_000012 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-146:8041 http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000012/dx container_1559203334026_0015_01_000013 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-146:8041 http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000013/dx container_1559203334026_0015_01_000010 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-146:8041 http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000010/dx container_1559203334026_0015_01_000011 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-146:8041 http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000011/dx container_1559203334026_0015_01_000016 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-146:8041 http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000016/dx container_1559203334026_0015_01_000014 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-146:8041 http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000014/dx container_1559203334026_0015_01_000015 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-146:8041 http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000015/dx container_1559203334026_0015_01_000004 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-142:8041 http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000004/dx container_1559203334026_0015_01_000005 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-142:8041 http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000005/dx container_1559203334026_0015_01_000002 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-142:8041 http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000002/dx container_1559203334026_0015_01_000003 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-142:8041 http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000003/dx container_1559203334026_0015_01_000008 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-142:8041 http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000008/dx container_1559203334026_0015_01_000009 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-142:8041 http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000009/dx container_1559203334026_0015_01_000006 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-142:8041 http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000006/dx container_1559203334026_0015_01_000007 Sat Jun 01 13:27:52 +0800 2019 N/A RUNNING CDH-142:8041 http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000007/dx container_1559203334026_0015_01_000001 Sat Jun 01 13:27:38 +0800 2019 N/A RUNNING CDH-142:8041 http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000001/dx
To specific executor where the node server, use the following command to find the running thread, and pid
[root@cdh-146 ~]# ps -axu | grep container_1559203334026_0015_01_000013 yarn 8844 0.0 0.0 113144 1496 ? S 13:27 0:00 bash /data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/default_container_executor.sh yarn 8857 0.0 0.0 113280 1520 ? Ss 13:27 0:00 /bin/bash -c /usr/java/jdk1.8.0_171-amd64/bin/java -server -Xmx6144m '-Dcom.sun.management.jmxremote' '-Dcom.sun.management.jmxremote.port=0' '-Dcom.sun.management.jmxremote.authenticate=false' '-Dcom.sun.management.jmxremote.ssl=false' -Djava.io.tmpdir=/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/tmp '-Dspark.network.timeout=10000000' '-Dspark.driver.port=47564' '-Dspark.port.maxRetries=32' -Dspark.yarn.app.container.log.dir=/data6/yarn/container-logs/application_1559203334026_0015/container_1559203334026_0015_01_000013 -XX:OnOutOfMemoryError='kill %p' org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@CDH-143:47564 --executor-id 12 --hostname CDH-146 --cores 2 --app-id application_1559203334026_0015 --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/__app__.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/streaming-dx-perf-3.0.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/dx-common-3.0.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/spark-sql-kafka-0-10_2.11-2.4.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/spark-avro_2.11-3.2.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/shc-core-1.1.2-2.2-s_2.11-SNAPSHOT.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/rocksdbjni-5.17.2.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/kafka-clients-0.10.0.1.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/elasticsearch-spark-20_2.11-6.4.1.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/dx_Spark_State_Store_Plugin-1.0-SNAPSHOT.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/bijection-core_2.11-0.9.5.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/bijection-avro_2.11-0.9.5.jar 1>/data6/yarn/container-logs/application_1559203334026_0015/container_1559203334026_0015_01_000013/stdout 2>/data6/yarn/container-logs/application_1559203334026_0015/container_1559203334026_0015_01_000013/stderr yarn 9000 143 3.3 8736712 4379648 ? Sl 13:27 24:35 /usr/java/jdk1.8.0_171-amd64/bin/java -server -Xmx6144m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=0 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.io.tmpdir=/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/tmp -Dspark.network.timeout=10000000 -Dspark.driver.port=47564 -Dspark.port.maxRetries=32 -Dspark.yarn.app.container.log.dir=/data6/yarn/container-logs/application_1559203334026_0015/container_1559203334026_0015_01_000013 -XX:OnOutOfMemoryError=kill %p org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@CDH-143:47564 --executor-id 12 --hostname CDH-146 --cores 2 --app-id application_1559203334026_0015 --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/__app__.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/dx-domain-perf-3.0.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/dx-common-3.0.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/spark-sql-kafka-0-10_2.11-2.4.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/spark-avro_2.11-3.2.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/shc-core-1.1.2-2.2-s_2.11-SNAPSHOT.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/rocksdbjni-5.17.2.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/kafka-clients-0.10.0.1.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/elasticsearch-spark-20_2.11-6.4.1.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/dx_Spark_State_Store_Plugin-1.0-SNAPSHOT.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/bijection-core_2.11-0.9.5.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/bijection-avro_2.11-0.9.5.jar root 25939 0.0 0.0 112780 956 pts/1 S+ 13:45 0:00 grep --color=auto container_1559203334026_0015_01_000013
And then find the corresponding JMX port by pid
[root@cdh-146 ~]# sudo netstat -antp | grep 9000 tcp 0 0 10.dx.dx.146:9000 0.0.0.0:* LISTEN 2642/python2.7 tcp6 0 0 :::48169 :::* LISTEN 9000/java tcp6 0 0 :::37692 :::* LISTEN 9000/java tcp6 0 0 10.dx.dx.146:52710 :::* LISTEN 9000/java tcp6 0 0 10.dx.dx.146:55535 10.dx.dx.142:38397 ESTABLISHED 9000/java tcp6 64088 0 10.dx.dx.146:45410 10.206.186.35:9092 ESTABLISHED 9000/java tcp6 0 0 10.dx.dx.146:60259 10.dx.dx.143:47564 ESTABLISHED 9000/java
Results seen, suspected to be 48,169 or 37,692 , slightly to try to connect to the corresponding spark executor
Use monitoring tools to add JvisulaVM.exe
Find the JDK directory on the local windows server, find the file $ {JAVA_HOME} /bin/JvisualVM.exe, and run it. Select the "Remote" Right after the start, add JMX monitoring
Fill monitoring executor where the node ip
Then you can start monitoring: