Setting up a Hadoop environment on CentOS 7 is a common task, here is a simple tutorial:
-
Install Java:
Hadoop is developed based on Java, so Java needs to be installed first. You can install Java on CentOS 7 by following these steps:- Download the Java JDK (Java Development Kit) tarball for Linux.
- Extract the tarball and install it to a directory of your choice.
- Configure the Java environment variable (JAVA_HOME).
-
Download and extract Hadoop:
- Visit the official Hadoop website and download the latest version of Hadoop for CentOS 7.
- Unzip the Hadoop tarball to a directory of your choice.
-
Configure Hadoop environment variables:
-
Open
~/.bashrc
the file and add the following lines:export HADOOP_HOME=/path/to/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
-
Run the following command for the environment variable to take effect:
source ~/.bashrc
-
-
Configure the Hadoop cluster:
- Enter the Hadoop configuration directory:
cd $HADOOP_HOME/etc/hadoop
-
Edit
hadoop-env.sh
the file to configure the Java path to the correct Java installation path:export JAVA_HOME=/path/to/java
- Edit
core-site.xml
the file to configure the core settings of Hadoop, such as file system and ports, etc. - Edit
hdfs-site.xml
the file to configure the settings of the Hadoop Distributed File System (HDFS), such as the data directory and number of replicas, etc. - Edit
mapred-site.xml
the file to configure Hadoop MapReduce settings such as task scheduling and executors. - Edit
yarn-site.xml
the file to configure the settings of the YARN resource manager, such as node management and resource allocation.
- Enter the Hadoop configuration directory:
-
Start the Hadoop cluster:
- Format HDFS:
hdfs namenode -format
- Start HDFS:
start-dfs.sh
- Start YARN:
start-yarn.sh
- Format HDFS:
-
Verify the Hadoop cluster:
- Open a web browser and visit the Hadoop resource manager URL:
http://localhost:8088
, to confirm that the YARN resource manager is running. - Check the status of HDFS:
hdfs dfsadmin -report
- Open a web browser and visit the Hadoop resource manager URL:
These are the basic steps to set up a Hadoop environment on CentOS 7. Depending on your needs and specific environment, additional configuration and tuning may be required. Make sure you have a detailed understanding of your network environment and security requirements and take appropriate security measures before performing any operations related to network connection and security.