Big Data interview questions (nine) ---- Spark interview questions

"I stumbled on a giant cow artificial intelligence course, could not help but share to everyone. Tutorial is not only a zero-based, user-friendly, and very humorous, like watching a fiction! Think too much bad, so to others. the point where you can jump to the tutorial.. "

Big Data Collection catalog interview, please click

table of Contents

1. Spark master zookeeper carried out using the HA, what metadata is stored in Zookeeper?
2. Spark master HA main job will not affect the existing cluster runs from the switching process, and why?
3. Spark on Mesos in, what is the distribution of coarse-grained, fine-grained allocation of what is, and what their strengths and weaknesses?
4. How to configure HA spark master's?
What are the common stable version 5. Apache Spark has, Spark1.6.0 figures represent what does this mean?
What 6. driver's function?


1. Spark master zookeeper carried out using the HA, what metadata is stored in Zookeeper?

       A: spark specify master metadata stored in the position by the parameter zookeeper spark.deploy.zookeeper.dir, including Worker, Driver and Application and Executors. zk from the standby node, obtaining metadata information, restore the cluster running to continue to provide the external services, applications and other resources, job submission, before the recovery is not acceptable request. In addition, Master switches need to pay attention to 2 points:
       1) In the course of Master switches, all of the program is already running are operating normally! Because the first run in Spark Application Cluster Manager has been obtained by computing resources, so the runtime itself Job scheduling and processing and Master is nothing to do!
       2) during the handover Master's influence is not only submit a new Job: on the one hand can not submit a new application to the cluster, because only Active Master can accept the request to submit a new program; on the other hand, has been running the program is not able to submit the request because Action trigger operation of the new Job;

2. Spark master HA main job will not affect the existing cluster runs from the switching process, and why?

       A: Because before the program is running, the resources have been requested, driver and Executors communications, and the master does not need to communicate.

3. Spark on Mesos in, what is the distribution of coarse-grained, fine-grained allocation of what is, and what their strengths and weaknesses?

       1) coarse-grained: it is a good resource allocation at startup, the program starts, follow-up on the use of the specific use of the allocated resources, does not
need redistribution of resources; benefits: job especially for a long time, resource reuse rate, suitable for coarse-grained; bad: easy to waste of resources, if a job has a 1000 task, the completion of the 999, there is not a complete, use coarse-grained, 999 idle resources will be there, waste of resources.
       2) Fine-grained allocation: resource allocation time, run out of resources recovered immediately, will start a little trouble starts an assignment once, it would be more trouble.

4. How to configure HA spark master's?
  1. Configuration zookeeper
  2. Spark_env.sh modify files, master spark parameter is not specified, the following code is added to each master node SPARK_DAEMON_JAVA_OPTS = Export
    "-Dspark.deploy.recoveryMode = Zookeeper
    -Dspark.deploy.zookeeper.url = zk01: 2181, zk02: 2181, zk03 : 2181
    -Dspark.deploy.zookeeper.dir = / the Spark "
  3. The spark_env.sh distributed to each node
  4. Find a master node performs ./start-all.sh, will be here to start the primary master, the other master backup node, start the master command: ./sbin/start-master.sh
  5. When submitting a program designated master when you want to specify three master, for example ./spark-shell --master spark: // master01: 7077, master02: 7077, master03: 7077
What are the common stable version 5. Apache Spark has, Spark1.6.0 figures represent what does this mean?

A: The common large stable versions Spark 1.3, Spark1.6, Spark 2.0, Spark1.6.0 number meaning

  1. The first number: 1
major version : 代表大版本更新,一般都会有一些api 的变化,以及大的优化或是一些结构的改变;
  1. Second number: 6
minor version : 代表小版本更新,一般会新加api,或者是对当前的api 就行优化,或者是其他内容
的更新,比如说WEB UI 的更新等等;
  1. The third number: 0
patch version , 代表修复当前小版本存在的一些bug,基本不会有任何api 的改变和功能更新;记
得有一个大神曾经说过,如果要切换spark 版本的话,最好选patch version 非0 的版本,因为一般
类似于1.2.0,1.6.0 这样的版本是属于大更新的,有可能会有一些隐藏的bug 或是不稳定性存在,所以最好选择1.2.1,1.6.1 这样的版本。
通过版本号的解释说明,可以很容易了解到,spark2.1.1 的发布时是针对大版本2.1 做的一些bug 修改,不会新增功能,也不会新增API,会比2.1.0 版本更加稳定。
What 6. driver's function?
  1. Spark Driver process comprises a job is running, the process is the main job, having a main function, and there are instances SparkContext is the entry point of the program;
  2. Function: Responsible for application to a cluster resource, the master registration information, responsible for scheduling jobs, jobs responsible for parsing, generating and scheduling Task Stage to the Executor. Including DAGScheduler, TaskScheduler.
Published 422 original articles · won praise 357 · Views 1.24 million +

Guess you like

Origin blog.csdn.net/silentwolfyh/article/details/103865243