Cloud server pseudo-distributed hadoop configuration + javaAPI upload, download, delete operation (summary of stepping on the pit)

Note: This blog post is mainly for the installation and configuration of a single cloud server hadoop. The blogger uses Alibaba Cloud's ECS server, Centos7 system, public network ip: 120.27.244.176 private network ip: 172.16.236.135 The security group opens all ports, namely 1/65535

1. Preparation

create:mkdir -p /opt/hadoop
enter:cd -p /opt/hadoop
Upload:
Unzip:tar -zxvf hadoop-2.7.3.tar.gz
Configure hosts:

vi  /etc/hosts
//必须添加内网ip
172.16.236.135 hadoop.idse.top

Configure environment variables

vi /etc/profile
//添加环境变量
#配置hadoop
export HADOOP_HOME=/opt/hadoop/hadoop-2.7.3
export PATH=$PATH:$HADOOP_HOME/bin
//刷新配置
source /etc/profile

Two, configure hadoop

Modify hadoop-env.sh

cd /opt/hadoop/hadoop-2.7.3/etc/hadoop/
vi hadoop-env.sh
//找到并修改
export JAVA_HOME=/opt/java/jdk1.8.0_141

enter description here

Modify core-site.xml

vi core-site.xml
//注意域名端口改为自己的hosts中设置
<property>
	<name>fs.defaultFS</name>
	<value>hdfs://hadoop.idse.top:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储目录 -->
<property>
	<name>hadoop.tmp.dir</name>
	<value>/zhiyou/hadoop/tmp</value>
</property>

enter description here

Configure hdfs-site.xml

vi hdfs-site.xml
<property>
	<name>dfs.replication</name>
	<value>1</value>
</property>

Place yarn-site.xml

vi yarn-site.xml
//注意修改域名配置
<property>
	<name>yarn.resourcemanager.hostname</name>
	<value>hadoop.idse.top</value>
</property>
<!-- reducer获取数据的方式 -->
<property>
	<name>yarn.nodemanager.aux-services</name>
	<value>mapreduce_shuffle</value>
</property>

enter description here

Placement mapred-site.xml

mv mapred-site.xml.template mapred-site.xml
vi mapred-site.xml
<property>
	<name>mapreduce.framework.name</name>
	<value>yarn</value>
</property>

start up

cd /opt/hadoop/hadoop-2.7.3/sbin/
./start-all.sh

View the process:jps
Browser access (public network ip+port):120.27.244.176:50070

Pit 1: Unsuccessful access, please go to the security group to open the port, the port required by Baidu hadoop, and the blogger will open it all.

Three, javaAPI upload and download file test

@Test//上传
public void upload() throws IOException, InterruptedException, URISyntaxException{
	//1.实例化configuration
	Configuration conf = new Configuration();
	/*云服务器必须配置*/
	conf.set("dfs.replication", "1");
	conf.set("dfs.client.use.datanode.hostname", "true");
	//2.获取文件系统
	FileSystem fs = FileSystem.get(new URI("hdfs://120.27.244.176:9000/"), conf, "root");
	//3.上传fs.copyFromLocalFile(本地文件,目标地址)
	fs.copyFromLocalFile(new Path("C:\\Users\\NEVER\\Desktop\\day04-06\\day04_05fastJson的使用.avi"),new Path("/day04_05fastJson的使用.avi"));
	//4.关闭连接
	fs.close();
}
@Test//下载
public void download() throws IOException, InterruptedException, URISyntaxException{
	//1.实例化configuration
	Configuration conf = new Configuration();
	conf.set("dfs.replication", "1");
	conf.set("dfs.client.use.datanode.hostname","true");
	//2.获取文件系统
	FileSystem fs = FileSystem.get(new URI("hdfs://120.27.244.176:9000/"), conf, "root");
	//3.下载
	//fs.copyToLocalFile(目标文件,本地地址)
	fs.copyToLocalFile(false,new Path("/c.jpg"), new Path("E:/d.jpg"),true);
	//4.关闭连接
	fs.close();
}
@Test//删除
public void remove() throws IOException, InterruptedException, URISyntaxException{
	//1.实例化configuration
	Configuration conf = new Configuration();
	//2.获取文件系统
	FileSystem fs = FileSystem.get(new URI("hdfs://120.27.244.176:9000/"), conf, "root");
	//fs.delete(要删除的文件，boolean true false-只能删除空的文件夹)
	fs.delete(new Path("/dow3.txt"),true);
	
	fs.close();
	
}

Pit 2: conf.set(“dfs.replication”, “1”);
conf.set(“dfs.client.use.datanode.hostname”, “true”) must be added ; otherwise the upload will be an empty file