Hadoop entry common interview questions and cluster time synchronization operation

Table of contents

1. Commonly used port numbers

Hadoop3.x :

Hadoop2.x:

Second, common configuration files:

Hadoop3.x:

Hadoop2.x:

Cluster time synchronization:

Time server configuration (must be root user):

(1) Check the ntpd service status and boot self-start status of all nodes

(2) Modify the ntp.conf configuration file of hadoop102

 (3) Modify the /etc/sysconfig/ntpd file of hadoop102

(4) Restart the ntpd service

Close the ntp service and self-start on all nodes

Configure 1 minute to synchronize with the time server on other machines

Add a scheduled task:


1. Commonly used port numbers

Hadoop3.x :

HDFS NameNode internal communication port: 8020 / 9000/9820

HDFS NameNode query port for users: 9870

Yarn   MapReduce view execution task port: 8088

History server port: 19888

Hadoop2.x:

HDFS NameNode internal communication port: 8020 / 9000

HDFS NameNode query port for users: 50070

Yarn   MapReduce view execution task port: 8088

History server port: 19888

Second, common configuration files:

Hadoop3.x:

core-site.xml

hdfs-site.xml

yarn-site.xml

mapred-site.xml

workers

Hadoop2.x:

core-site.xml

hdfs-site.xml

yarn-site.xml

mapred-site.xml

slaves

Cluster time synchronization:

If the server is in a public network environment (can be connected to the external network), cluster time synchronization may not be used , because the server will be regularly calibrated with the public network time;

If the server is in an intranet environment , cluster time synchronization must be configured. Otherwise, time deviation will occur after a long time, causing the cluster to perform tasks out of sync.

Find a machine as a time server , and all machines will synchronize with this cluster time regularly. The production environment requires periodic synchronization according to the accuracy of tasks. In order to see the effect as soon as possible in the test environment, it is synchronized every 1 minute.

Time server configuration (must be root user):

(1) Check the ntpd service status and boot self-start status of all nodes

systemctl status ntpd ------ Check whether the time server is open

         systemctl start ntpd ------Start time server

 

      systemctl is-enabled ntpd ----Set whether to start the ntpd service after booting

(2) Modify the ntp.conf configuration file of hadoop102

Modify the configuration file to set which servers are synchronized with this server:

vim /etc/ntp.conf

Add at the end:

server 127.127.1.0
fudge 127.127.1.0 stratum 10

 The following figure has removed the comment and changed its own IP range:

restrict 192.168.10.0 mask 255.255.255.0 nomodify notrap

The next few lines are commented out 

 (3) Modify the /etc/sysconfig/ntpd file of hadoop102

vim /etc/sysconfig/ntpd

The additions are as follows (to synchronize the hardware time with the system time --- the hardware time is more accurate)

SYNC_HWCLOCK=yes

(4) Restart the ntpd service

systemctl start ntpd

Close the ntp service and self-start on all nodes

systemctl stop ntpd
systemctl disable ntpd

Configure 1 minute to synchronize with the time server on other machines

crontab -e

Add a scheduled task:

*/1 * * * * /usr/sbin/ntpdate hadoop102

The introduction to Hadoop is over, let's update and learn HDFS with me! 

Guess you like

Origin blog.csdn.net/m0_61469860/article/details/129463711