Big data | (1) Hadoop pseudo-distributed installation

Big Data Principles and Applications Textbook Link: Big Data Technology Principles and Applications Electronic Courseware - Edited by Lin Ziyu

Hadoop pseudo-distributed installation reference article: Hadoop pseudo-distributed installation - more detailed than textbooks

Big Data| (2) SSH connection error report Permission denied: SSH connection error report Permission denied

Hello everyone! This issue brings you the pseudo-distributed installation of Hadoop.

With the advent of the big data era, "big data" has become a buzzword in the Internet information technology industry.

With the development of Hadoop, Hadoop has gradually become synonymous with big data. 

1. Overview of Hadoop

1.1 Introduction to Hadoop

Hadoop is an open source distributed computing platform under the Apache Software Foundation, which provides users with a distributed infrastructure with transparent details of the underlying system.

Hadoop is developed based on the Java language, has good cross-platform features, and is open source and deployed in cheap computer clusters. The core of Hadoop is HDFS (Hadoop Distributed File System) and MapReduce (a programming model)

1.2 Hadoop Characteristics

Hadoop is a software framework capable of distributed processing of large amounts of data, and it is processed in a reliable, efficient, and scalable manner. It has the following characteristics:

  • high reliability. Even if a replica happens to applaud, other replicas can guarantee to provide services to the outside world normally.
  • Efficiency. Hadoop adopts two core technologies of distributed storage and distributed processing, which can efficiently process PB-level data
  • High scalability. Hadoop can scale to thousands of computer nodes.
  • High fault tolerance. Data redundancy storage is adopted to automatically save multiple copies of data.
  • low cost. Hadoop uses cheap computer clusters
  • Runs on a Linux system. Hadoop is developed based on the Java language and can run better on Linux
  • Multiple programming languages ​​are supported. Applications on Hadoop can also be written in other languages ​​such as C++.

1.3 Current status of Hadoop application

Domestic companies using Hadoop mainly include Baidu, Taobao, Netease, Huawei, China Mobile, etc. Among them, Taobao has a relatively large computer cluster.

1.4 Hadoop version

The Apache Hadoop version is divided into three generations, namely Hadoop1.0, Hadoop2.0, and Hadoop3.0. In addition to the free and open source Apache Hadoop, there are also some Hadoop distributions launched by commercial companies. In 2008, Cloudera became the first Hadoop commercialization company and launched the first Hadoop distribution in 2009.

2. Hadoop Ecosystem

After years of development, the Hadoop ecosystem has been continuously improved and matured. It currently includes multiple sub-projects. In addition to the core HDFS and MapReduce, the Hadoop ecosystem also includes ZooKeeper, HBase, Hive, Pig, Mahout, Flume, Sqoop, Ambari, etc. functional components.

3. Installation and use of Hadoop

3.1 Update apt and install vim editor

First update the package with the following command:

sudo apt-get update

 Then install the Vim editor:

sudo apt-get install vim

3.2 Install SSH and configure SSH password-free login

Install SSH-Server using the following command:

sudo apt-get install openssh-server

Then you can use the following command and enter to log in to the machine:

ssh localhost

Enter the following command to log out:

exit

Use the command to enter the following directory:

cd ~/.ssh/

Generate public and private keys:

ssh-keygen -t rsa

At this time, ls, you can see these directories under the folder:

Then use the following command to log in directly!

ssh localhost

If you encounter an error reporting an SSH password-free login, please refer to this article by the blogger. Because of the space, this error is separated for your convenience. SSH Connection Error Permission denied

3.3 Install the Java environment

If you have installed JDK before, you can use the following command to view JAVA_HOME (the installation path of JDK), enter Java, javac and other detections, and skip this step.

echo $JAVA_HOME

If you have not installed JDK before, please continue to look down.

First download the JDK8 compressed package from the official website or blogger's Baidu network disk:

Official website download address: JDK8Linux compressed package download address

Baidu network disk download address: Baidu network disk JDK8Linux compressed package download address

Transfer to the Linux system via xftp or lrzsz, and decompress to the current folder:

tar -xzvf jdk-8u202-linux-x64.tar.gz

Configure environment variables:

vim ~/.bashrc

Press i to enter insert mode and enter the following at the beginning of the file:

Press esc, then enter a colon, wq to save and exit.

Refresh configuration:

source ~/.bashrc

Use the following command to test whether the installation is successful:

java -version

If a result similar to the following appears, the installation is successful!

3.4 Install Hadoop on a single machine

To download Hadoop, you can download it from the official website or the blogger’s Baidu network disk. The Hadoop version selected here is 3.1.3.
Hadoop official website download: Hadoop official website download address

Baidu network disk download address: Hadoop Baidu network disk download address

Then upload the installation package to the Linux server, and use the following command to decompress it:

tar -xzvf hadoop-3.1.3.tar.gz

After decompression, I get this Hadoop-3.1.3 folder, but here I changed the file name

Rename the file command:

mv hadoop-3.1.3 hadoop

 Now you can run the following command to check if Hadoop is installed successfully:

./bin/hadoop version

 At this point, the installation of Hadoop is complete, and the pseudo-distributed installation of Hadoop is performed below (important!)

3.5 Hadoop pseudo-distributed installation

First modify two configuration files, namely core-site.xml file and hdfs-site.xml file, enter the etc/hadoop directory under the hadoop directory, and perform the following operations.

Modify the content of the core-site.xml file as follows:

Modify the content of the core-site.xml file as follows:

Enter the hadoop directory, and then execute the following command:

./bin/hdfs namenode -format

After execution, if there is no error similar to Java error, it means that Hadoop pseudo-distributed installation is successful!

Because this Hadoop format can only be executed once, the blogger has already executed it before, so the execution result will not be demonstrated here. 

Attach some commands that may be used during the operation:

Check file permissions:

ls -l 文件名

User operation:

list all users

cat /ect/passwd

delete users:

userdel -r 用户名

Add user:

sudo useradd -m 用户名

Switch user:

su

Write at the end:

As a technology that has only emerged in recent years, big data has an important impact on scientific research, ways of thinking, social development, job market and personnel training. I hope you can install this mileage from Hadoop and start your big data journey! mutual encouragement!

Guess you like

Origin blog.csdn.net/qq_62592360/article/details/129270915