1 Overview
We generally deploy Hadoop to the server, then it will appear that the MapReduce task cannot be run directly in Windows. It is necessary to export the MapReduce task as a jar package, and then upload it to the server to run, and run the command:
$hadoop jar [jar文件] [main启动类] [输入文件] [输出文件]
Note: The main startup class needs to be the fully qualified name of the class
However, by executing MapReduce tasks in this way, we cannot debug the execution process of source code with breakpoints. In fact, we can solve it by remote debugging.
2 Solutions
2.1 The server starts the monitoring service
To debug through remote breakpoints, you must first pause and start a monitoring service when starting the MapReduce task on the remote server, and wait for the client to connect and debug, which can be done by setting the parameters of the JVM at runtime.
-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000
Note: The parameters of this command are provided by JDK. You can view the specific description of each parameter of agentlib through java -agentlib:jdwp=help, where address=8000 indicates that the listening port is 8000
2.2 Where should the parameters be configured?
The MapReduce task is started and executed through the hadoop jar command, then analyze the hadoop running script in the bin directory of the hadoop installation directory, you can find the JVM running parameter settings of hadoop before executing the program in the script, here we can set it to the HADOOP_CLIENT_OPTS parameter , we can temporarily add an environment variable directly through the export command in the current shell
3 [Summary] Remote debugging configuration steps
After the above analysis, it is not difficult to see that the server only needs to configure a runtime parameter, and then it can be remotely debugged through the IDEA development tool. The specific configuration steps are as follows:
3.1 Remote server configuration
(1) Add a temporary environment variable on the server side
$export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000"
(2) The server side executes MapReduce tasks, such as
$hadoop jar mapreduce-task.jar com.os.china.mapreduce.weather.JobRun /file/weather.txt /out
At this point, it can be found that the startup program is tentatively scheduled and the service listening port 8000 is opened.
3.2 Remote debugging configuration in Windows local IDEA
(1) Create a new remote debug configuration in IDEA
(2) Click ok after configuration
Note: Host: parameter configures the ip address of the remote server, Port: the remote server listens on the port
(3) Debug operation can start the remote debugging of the MapReduce task execution process