- problem solution
cd hadoop installation path /ect/hadoop
Modify hdfs-site.xml and add the following
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
The purpose is to cancel the permission check. The reason is to solve the problem that when I configure eclipse to connect to the hadoop server on the windows machine, the following error is reported after configuring the map/reduce connection, org.apache.hadoop.security.AccessControlException: Permission denied:
Modify hdfs-site.xml and add the following content (currently the test is optional according to the specific situation)
<property>
<name>dfs.web.ugi</name>
<value>228238,supergroup</value>
</property>
The reason is that when running, the following error is reported: WARN org.apache.hadoop.security.ShellBasedUnixGroupsMapping: got exception trying to get groups for user 228238 (228238 machine username)
After the configuration is modified, restart the hadoop cluster:
[root@supervisor-84 sbin]# ./stop-dfs.sh
[root@supervisor-84 sbin]# ./sbin/stop-yarn.sh
[root@supervisor-84 sbin]#./sbin/start-dfs.sh
[root@supervisor-84 sbin]# ./sbin/start-yarn.sh
- Windows basic environment preparation
windows7(x64),jdk,ant,eclipse,hadoop
jdk environment configuration
After jdk-6u26-windows-i586.exe is installed, configure the relevant JAVA_HOME environment variables, and configure the bin directory to path
eclipse environment configuration
eclipse-standard-luna-SR1-win32.zip extract to F:\eclipse\
Download address: http://developer.eclipsesource.com/technology/epp/luna/eclipse-standard-luna-SR1-win32.zip
ant environment configuration
Extract apache-ant-1.9.4-bin.zip to D:\apache-ant\, configure the environment variable ANT_HOME, and configure the bin directory to path
Download address: http://mirror.bit.edu.cn/apache//ant/binaries/apache-ant-1.9.4-bin.zip
download hadoop-2.5.2.tar.gz
http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.5.2/hadoop-2.5.2.tar.gz
download hadoop-2.5.2-src.tar.gz
http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.5.2/hadoop-2.5.2-src.tar.gz
Download hadoop2x-eclipse-plugin
https://github.com/winghc/hadoop2x-eclipse-plugin
download hadoop-common-2.2.0-bin
https://github.com/srccodes/hadoop-common-2.2.0-bin
Download and unzip hadoop-2.5.2.tar.gz, hadoop-2.5.2-src.tar.gz, hadoop2x-eclipse-plugin, hadoop-common-2.2.0-bin respectively to the F:\hadoop\ directory
Note: htrace-core-3.0.4.jar is missing in hadoop-2.5.2 \share\hadoop\common\lib after decompression, you can download it from the Internet and put it in this directory.
- Compile hadoop-eclipse-plugin-2.5.2.jar configuration
Add environment variable HADOOP_HOME=F:\hadoop\hadoop-2.5.2\
Append the content of the environment variable path: %HADOOP_HOME%/bin
Modify the version information of the compiled package and dependent packages
修改F:\hadoop\hadoop2x-eclipse-plugin-master\ivy\libraries.properties
hadoop.version=2.5.2
jackson.version=1.9.13
ant compile
F:\hadoop\hadoop2x-eclipse-plugin-master\src\contrib\eclipse-plugin>
ant jar -Dversion=2.5.2 -Declipse.home=F:\eclipse\eclipse-hadoop\eclipse -Dhadoop.home=F:\hadoop\hadoop-2.5.2
After compiling, hadoop-eclipse-plugin-2.5.2.jar will be in the F:\hadoop\hadoop2x-eclipse-plugin-master\build\contrib\eclipse-plugin directory
- eclipse environment configuration
Copy the compiled hadoop-eclipse-plugin-2.5.2.jar to the plugins directory of eclipse, then restart eclipse
2. Open the menu Window--Preference--Hadoop Map/Reduce to configure, as shown in the following figure:
Display the Hadoop connection configuration window: Window--Show View--Other-MapReduce Tools, as shown in the following figure:
Configure the connection to Hadoop, as shown in the following figure:
Check whether the connection is successful, create a new folder and upload a file and you can see information similar to the following, indicating that the connection is successful:
- Map/Reduce Project project creation
Right-click in the project bar and select new –-> other –> Map/Reduce Project
Next, fill in the name of the MapReduce project as "WordCountProject" and click "finish" to complete.
So far we have successfully created a MapReduce project, and we found that there are more projects we just created on the left side of the Eclipse software.
Create log4j.properties file to have log output in eclipse console
Create a log4j.properties file in the src directory with the following contents:
log4j.rootLogger=debug,stdout,R
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%5p - %m%n
log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.File=mapreduce_test.log
log4j.appender.R.MaxFileSize=1MB
log4j.appender.R.MaxBackupIndex=1
log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n
log4j.logger.com.codefutures=DEBUG
Note: Put winutils.exe and hadoop.dll (it can be found in hadoop-common-2.2.0-bin as mentioned above) into the F:\hadoop\hadoop-2.5.2\bin directory
Create a new 228238 (PC user name) folder in the user directory of DFS Locations, and create a new input folder in this folder; the result after running is in the newout directory.
Create a new class WordCount, package name: org.apache.hadoop.examples
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one); }
}
}
public static class IntSumReducer extends
Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
// Initialize Configuration
Configuration conf = new Configuration();
conf.set("mapred.job.tracker", "192.168.68.84:9001");
String[] ars=new String[]{"input","newout"};
// GenericOptionsParser class, which is used to explain common hadoop commands,
// And set the corresponding value for the Configuration object as needed. In fact, we don't often use it in normal development.
// Instead, let the class implement the Tool interface, and then use ToolRunner to run the program in the main function,
// And ToolRunner will call GenericOptionsParser internally
String[] otherArgs = new GenericOptionsParser(conf, ars).getRemainingArgs();
// There must be two parameters when running the WordCount program, if not, it will report an error and exit
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount ");
System.exit(2);
}
// building a job
Job job = new Job(conf, "word count");
// Load the prepared calculation program
job.setJarByClass(WordCount.class);
// load map function
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
// Load the reduce function implementation class
job.setReducerClass (IntSumReducer.class);
// Define the type of the output key/value
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// Build the input data file
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
// Build the output data file
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
// If the job runs successfully, our program will exit normally
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
右击 Run AS Run on hadoop
Debug debug run:
Modify the mapred-site.xml file and add the following configuration:
<property>
<name>mapred.child.java.opts</name>
<value>-agentlib:jdwp=transport=dt_socket,address=8883,server=y,suspend=y</value>
</property>
Right-click the hadoop src project, right-click "Debug As", select "Debug Configurations", select "Remote Java Application", add a new test, enter the remote host ip and listening port, for example 8883, and then click the "Debug" button