Cloud server pseudo-distributed hadoop configuration + javaAPI upload, download, delete operation (summary of stepping on the pit)

Cloud server pseudo-distributed hadoop configuration + javaAPI upload, download, delete operation (summary of stepping on the pit)


Note: This blog post is mainly for the installation and configuration of a single cloud server hadoop. The blogger uses Alibaba Cloud's ECS server, Centos7 system, public network ip: 120.27.244.176 private network ip: 172.16.236.135 The security group opens all ports, namely 1/65535


1. Preparation

  1. create:mkdir -p /opt/hadoop
  2. enter:cd -p /opt/hadoop
  3. Upload:
  4. Unzip:tar -zxvf hadoop-2.7.3.tar.gz
  5. Configure hosts:
vi  /etc/hosts
//必须添加内网ip
172.16.236.135 hadoop.idse.top
  1. Configure environment variables
vi /etc/profile
//添加环境变量
#配置hadoop
export HADOOP_HOME=/opt/hadoop/hadoop-2.7.3
export PATH=$PATH:$HADOOP_HOME/bin
//刷新配置
source /etc/profile

Two, configure hadoop

  1. Modify hadoop-env.sh
cd /opt/hadoop/hadoop-2.7.3/etc/hadoop/
vi hadoop-env.sh
//找到并修改
export JAVA_HOME=/opt/java/jdk1.8.0_141

enter description here

  1. Modify core-site.xml
vi core-site.xml
//注意域名端口改为自己的hosts中设置
<property>
	<name>fs.defaultFS</name>
	<value>hdfs://hadoop.idse.top:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储目录 -->
<property>
	<name>hadoop.tmp.dir</name>
	<value>/zhiyou/hadoop/tmp</value>
</property>

enter description here

  1. Configure hdfs-site.xml
vi hdfs-site.xml
<property>
	<name>dfs.replication</name>
	<value>1</value>
</property>
  1. Place yarn-site.xml
vi yarn-site.xml
//注意修改域名配置
<property>
	<name>yarn.resourcemanager.hostname</name>
	<value>hadoop.idse.top</value>
</property>
<!-- reducer获取数据的方式 -->
<property>
	<name>yarn.nodemanager.aux-services</name>
	<value>mapreduce_shuffle</value>
</property>

enter description here

  1. Placement mapred-site.xml
mv mapred-site.xml.template mapred-site.xml
vi mapred-site.xml
<property>
	<name>mapreduce.framework.name</name>
	<value>yarn</value>
</property>
  1. start up
cd /opt/hadoop/hadoop-2.7.3/sbin/
./start-all.sh 
  1. View the process:jps
    enter description here
  2. Browser access (public network ip+port):120.27.244.176:50070
    enter description here

Pit 1: Unsuccessful access, please go to the security group to open the port, the port required by Baidu hadoop, and the blogger will open it all.

Three, javaAPI upload and download file test

@Test//上传
public void upload() throws IOException, InterruptedException, URISyntaxException{
	//1.实例化configuration
	Configuration conf = new Configuration();
	/*云服务器必须配置*/
	conf.set("dfs.replication", "1");
	conf.set("dfs.client.use.datanode.hostname", "true");
	//2.获取文件系统
	FileSystem fs = FileSystem.get(new URI("hdfs://120.27.244.176:9000/"), conf, "root");
	//3.上传fs.copyFromLocalFile(本地文件,目标地址)
	fs.copyFromLocalFile(new Path("C:\\Users\\NEVER\\Desktop\\day04-06\\day04_05fastJson的使用.avi"),new Path("/day04_05fastJson的使用.avi"));
	//4.关闭连接
	fs.close();
}
@Test//下载
public void download() throws IOException, InterruptedException, URISyntaxException{
	//1.实例化configuration
	Configuration conf = new Configuration();
	conf.set("dfs.replication", "1");
	conf.set("dfs.client.use.datanode.hostname","true");
	//2.获取文件系统
	FileSystem fs = FileSystem.get(new URI("hdfs://120.27.244.176:9000/"), conf, "root");
	//3.下载
	//fs.copyToLocalFile(目标文件,本地地址)
	fs.copyToLocalFile(false,new Path("/c.jpg"), new Path("E:/d.jpg"),true);
	//4.关闭连接
	fs.close();
}
@Test//删除
public void remove() throws IOException, InterruptedException, URISyntaxException{
	//1.实例化configuration
	Configuration conf = new Configuration();
	//2.获取文件系统
	FileSystem fs = FileSystem.get(new URI("hdfs://120.27.244.176:9000/"), conf, "root");
	//fs.delete(要删除的文件,boolean true false-只能删除空的文件夹)
	fs.delete(new Path("/dow3.txt"),true);
	
	fs.close();
	
}

Pit 2: conf.set(“dfs.replication”, “1”);
conf.set(“dfs.client.use.datanode.hostname”, “true”) must be added ; otherwise the upload will be an empty file

Guess you like

Origin blog.csdn.net/qq_39231769/article/details/102826962