Specify the heap memory when Spark starts the Executor process

CommandUtils is one of the most commonly used tools in Spark, and its role is to build processes. If you don't care about its implementation, it will not affect the reading of Spark source code and the study of principles. The method we want to introduce is as follows:

buildProcessBuilder

  def buildProcessBuilder(
      command: Command,
      securityMgr: SecurityManager,
      memory: Int,
      sparkHome: String,
      substituteArguments: String => String,
      classPaths: Seq[String] = Seq[String](),
      env: Map[String, String] = sys.env): ProcessBuilder = {
    val localCommand = buildLocalCommand(
      command, securityMgr, substituteArguments, classPaths, env)
    val commandSeq = buildCommandSeq(localCommand, memory, sparkHome)
    val builder = new ProcessBuilder(commandSeq: _*)
    val environment = builder.environment()
    for ((key, value) <- localCommand.environment) {
      environment.put(key, value)
    }
    builder
  }

buildLocalCommand

  private def buildLocalCommand(
      command: Command,
      securityMgr: SecurityManager,
      substituteArguments: String => String,
      classPath: Seq[String] = Seq[String](),
      env: Map[String, String]): Command = {
    val libraryPathName = Utils.libraryPathEnvName
    val libraryPathEntries = command.libraryPathEntries
    val cmdLibraryPath = command.environment.get(libraryPathName)
 
    var newEnvironment = if (libraryPathEntries.nonEmpty && libraryPathName.nonEmpty) {
      val libraryPaths = libraryPathEntries ++ cmdLibraryPath ++ env.get(libraryPathName)
      command.environment + ((libraryPathName, libraryPaths.mkString(File.pathSeparator)))
    } else {
      command.environment
    }
 
    if (securityMgr.isAuthenticationEnabled) {
      newEnvironment += (SecurityManager.ENV_AUTH_SECRET -> securityMgr.getSecretKey)
    }
 
    Command(
      command.mainClass,
      command.arguments.map(substituteArguments),
      newEnvironment,
      command.classPathEntries ++ classPath,
      Seq[String](), // library path already captured in environment variable
      command.javaOpts.filterNot(_.startsWith("-D" + SecurityManager.SPARK_AUTH_SECRET_CONF)))
  }

buildCommandSeq

  private def buildCommandSeq(command: Command, memory: Int, sparkHome: String): Seq[String] = {
    val cmd = new WorkerCommandBuilder(sparkHome, memory, command).buildCommand()
    cmd.asScala ++ Seq(command.mainClass) ++ command.arguments
  }

After introducing the CommandUtils command tool class, I formally enter the topic. In fact, the start of the Executor process is to generate ProcessBuilder by calling the CommandUtils.buildProcessBuilder method, and then execute its start method to start ProcessBuilder to generate the process. Here focuses on how to build cmd. The buildCommandSep method is called in the buildProcessBuilder. The last line of this method has clearly seen the structure of the command (CommandSeq) ([javaOpt + classPath] + mainclass + args), and then analyze the buildCommand method of the WorkerCommandBuilder.

/**
 * This class is used by CommandUtils. It uses some package-private APIs in SparkLauncher, and since
 * Java doesn't have a feature similar to `private[spark]`, and we don't want that class to be
 * public, needs to live in the same package as the rest of the library.
 */
private[spark] class WorkerCommandBuilder(sparkHome: String, memoryMb: Int, command: Command)
    extends AbstractCommandBuilder {

  childEnv.putAll(command.environment.asJava)
  childEnv.put(CommandBuilderUtils.ENV_SPARK_HOME, sparkHome)

  override def buildCommand(env: JMap[String, String]): JList[String] = {
    val cmd = buildJavaCommand(command.classPathEntries.mkString(File.pathSeparator))
    cmd.add(s"-Xmx${memoryMb}M")
    command.javaOpts.foreach(cmd.add)
    cmd
  }

  def buildCommand(): JList[String] = buildCommand(new JHashMap[String, String]())

}

You can see that the -Xmx${memoryMb}M was spliced ​​when building the cmd command, and the memoryMb here is the executor-memory value in SparkConf. The truth is revealed here, and the executor-memory we set is finally used for startup The Executor process specifies the maximum heap.

Guess you like

Origin blog.csdn.net/qq_32445015/article/details/104899634