CDH + Kylin Trilogy Part Two: Deployment and Setup

This article is the second in a series of "CDH + Kylin Trilogy". The previous article "CDH + Kylin Trilogy: Preparations" has prepared the required machines and documents, and CDH and Kylin can be deployed;

Execute ansible script to deploy CDH and Kylin (ansible computer)

  1. Enter the ~ / playbooks directory of the ansible computer , and after the preparation of the previous article, the directory should be the following content:
    Insert picture description here
  2. Check whether the ansible remote operation of the CDH server is normal. Run the ansible deskmini -a "free -m" command to display the memory information of the CDH server under normal conditions, as shown below:
    Insert picture description here
  3. Execute the command to start deployment: ansible-playbook cm6-cdh5-kylin264-single-install.yml
  4. The entire deployment process involves time-consuming operations such as online installation and transmission of large files. Please wait patiently (about half an hour). If you exit with errors during deployment (such as network problems), you only need to repeat the above command, ansible guarantees the operation Idempotency;
  5. The successful deployment is shown below:
    Insert picture description here

Restart the CDH server

Since the settings of selinux and swap are modified, the operating system needs to be restarted to take effect, so please restart the CDH server;

Execute ansible script to start CDH service (ansible computer)

  1. Wait for the CDH server to restart successfully;
  2. Log in to ansible computer and enter ~ / playbooks directory;
  3. Execute the script to initialize the database and start the CDH: ansible-playbook cdh-single-start.yml
  4. After the startup is complete, the following information is output:
    Insert picture description here
  5. Log in to the CDH server with ssh, and execute this command to observe the start of the CDH service: tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log , when you see the content in the red box below, it means that it is started Finished, you can log in with a browser:
    Insert picture description here

Settings (browser operation)

Now that the CDH service has been started, you can operate it through the browser:

  1. Browser access: http://192.168.50.134:7180, as shown below, the account password is admin :
    Insert picture description here
  2. All the way to the next, select the 60-day trial version on the select version page:
    Insert picture description here
  3. Select the host page to see the CDH server (deskmini):
    Insert picture description here
  4. On the page for selecting the CDH version, please select 5.16.2-1 in the red box below:
    Insert picture description here
  5. Go to the page where Parcel is installed. Since the offline parcle package is uploaded in advance, the download progress becomes 100% instantly. At this time, please wait for the completion of distribution, decompression, and activation:
    Insert picture description here
  6. Next, there are some recommended operations. Here you can skip the red box as shown below:
    Insert picture description here
  7. Next is the page for selecting a service. I chose a custom service, and then selected HBase, HDFS, Hive, Hue, Oozie, Spark, YARN, Zookeeper, which can meet the needs of running Kylin:
    Insert picture description here
  8. On the host selection page, select the CDH server:
    Insert picture description here
  9. The next page is the database settings that you fill in the content must be consistent with the following figure , the host name is localhost , Hive database, user, password are the hive, Activity Monitor database, user, password are amon, Reports Manager The database, user, and password are all rman, the database, user, and password of Oozie Server are all oozie, and the database, user, and password of Hue are all hue. These contents have been fixed in the ansible script, and the filling here must be consistent :
    Insert picture description here
  10. On the page for setting parameters, please set according to the actual situation of your hard disk. I have enough space under the / home directory, so the storage location is changed to / home directory:
    Insert picture description here
  11. Wait for the service to start:
    Insert picture description here
  12. The start of each service is completed:
    Insert picture description here

HDFS settings

  1. As shown in the red box below, there is a problem with the HDFS service:
    Insert picture description here
  2. Click the red exclamation mark in the picture above to see the details of the problem. The following picture is a common copy problem:
    Insert picture description here
  3. The operation is as shown in the figure below. On the parameter setting page of HDFS, set the value of dfs.replication to 1 (only one data node):
    Insert picture description here
  4. After the above settings, the number of copies has been adjusted to 1, but the number of copies of the existing files has not been synchronized, you need to re-set, SSH login to the CDH server;
  5. Execute the command su-hdfs to switch to the hdfs account, and then execute the following command to complete the copy number setting:
hadoop fs -setrep -R 1 /
  1. Go back to the webpage and restart the HDFS service, as shown below:
    Insert picture description here
  2. After restart, HDFS service is normal:
    Insert picture description here

YARN settings

The default YARN parameters are very conservative, and some settings need to be made to successfully execute Spark tasks:

  1. Enter the YARN management page;
  2. As shown in the figure below, check the value of the parameter yarn.nodemanager.resource.cpu-vcores , the value must be greater than 1, otherwise YARN does not allocate resources to perform the task after submitting the Spark task, (if your CDH server is a virtual machine, when the CPU When single core, this parameter will be set to 1, the solution is to increase the number of virtual machine CPU cores, and then modify this parameter):
    Insert picture description here
  3. yarn.scheduler.minimum-allocation-mb : the minimum memory that can be applied for a single container, I set it to 1G
  4. yarn.scheduler.maximum-allocation-mb : the maximum memory that can be applied for a single container, I set it to 8G
  5. yarn.nodemanager.resource.memory-mb : the maximum available memory of the node, I set it to 8G
  6. The values ​​of the above three parameters are based on the background of my CDH server with 32G memory, please adjust according to your own hardware resources;
  7. After setting, restart the YARN service, the operation is shown in the figure below:
    Insert picture description here

Spark settings (CDH server)

You need to prepare a directory and related jars in the Spark environment, otherwise Kylin will start with an error (prompt spark not found, set SPARK_HOME, or run bin / download-spark.sh ), SSH into the CDH server as root , execute the following command:

mkdir $SPARK_HOME/jars \
&& cp $SPARK_HOME/assembly/lib/*.jar $SPARK_HOME/jars/ \
&& chmod -R 777 $SPARK_HOME/jars

Start Kylin (CDH server)

  1. Log in to the CDH server via SSH and execute su-hdfs to switch to the hdfs account;
  2. According to the official recommendation, first execute the command to check the environment: $ KYLIN_HOME / bin / check-env.sh
  3. If the check passes, the console output is as follows:
    Insert picture description here
  4. 启动 Kylin :$ KYLIN_HOME / bin / kylin.sh start
  5. The console outputs the following content indicating that Kylin is successfully started:
    Insert picture description here

Log in to Kylin

  1. Browser access: http://192.168.50.134:7070/kylin, as shown below, account ADMIN , password KYLIN (account and password are in uppercase):
    Insert picture description here
  2. The login is successful and you can use it:
    Insert picture description here
    At this point, the deployment, setup, and startup of CDH and Kylin have been completed, and Kylin is available. In the next article, we will run the official demo of Kylin in this environment to experience Kylin;

Welcome to pay attention to my public number: programmer Xinchen

Insert picture description here

Published 376 original articles · praised 986 · 1.28 million views

Guess you like

Origin blog.csdn.net/boling_cavalry/article/details/105449952