Hadoop+eclipse integrated operation

  • problem solution

cd hadoop installation path /ect/hadoop

Modify hdfs-site.xml and add the following 

<property> 

<name>dfs.permissions</name> 

<value>false</value> 

</property>

The purpose is to cancel the permission check. The reason is to solve the problem that when I configure eclipse to connect to the hadoop server on the windows machine, the following error is reported after configuring the map/reduce connection, org.apache.hadoop.security.AccessControlException: Permission denied:

 

Modify hdfs-site.xml and add the following content (currently the test is optional according to the specific situation)

<property> 

<name>dfs.web.ugi</name> 

<value>228238,supergroup</value> 

</property> 

The reason is that when running, the following error is reported: WARN org.apache.hadoop.security.ShellBasedUnixGroupsMapping: got exception trying to get groups for user 228238 (228238 machine username)

After the configuration is modified, restart the hadoop cluster: 

[root@supervisor-84 sbin]# ./stop-dfs.sh

[root@supervisor-84 sbin]# ./sbin/stop-yarn.sh 

[root@supervisor-84 sbin]#./sbin/start-dfs.sh 

[root@supervisor-84 sbin]# ./sbin/start-yarn.sh

 

  • Windows basic environment preparation

windows7(x64),jdk,ant,eclipse,hadoop

jdk environment configuration 

After jdk-6u26-windows-i586.exe is installed, configure the relevant JAVA_HOME environment variables, and configure the bin directory to path

 

eclipse environment configuration 

eclipse-standard-luna-SR1-win32.zip extract to F:\eclipse\

        Download address: http://developer.eclipsesource.com/technology/epp/luna/eclipse-standard-luna-SR1-win32.zip

 

ant environment configuration 

Extract apache-ant-1.9.4-bin.zip to D:\apache-ant\, configure the environment variable ANT_HOME, and configure the bin directory to path 

Download address: http://mirror.bit.edu.cn/apache//ant/binaries/apache-ant-1.9.4-bin.zip

 

download hadoop-2.5.2.tar.gz 

http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.5.2/hadoop-2.5.2.tar.gz

download hadoop-2.5.2-src.tar.gz 

http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.5.2/hadoop-2.5.2-src.tar.gz 

 

Download hadoop2x-eclipse-plugin 

https://github.com/winghc/hadoop2x-eclipse-plugin 

 

download hadoop-common-2.2.0-bin 

https://github.com/srccodes/hadoop-common-2.2.0-bin 

 

Download and unzip hadoop-2.5.2.tar.gz, hadoop-2.5.2-src.tar.gz, hadoop2x-eclipse-plugin, hadoop-common-2.2.0-bin respectively to the F:\hadoop\ directory 

 

Note: htrace-core-3.0.4.jar is missing in hadoop-2.5.2 \share\hadoop\common\lib after decompression, you can download it from the Internet and put it in this directory.

 

  • Compile hadoop-eclipse-plugin-2.5.2.jar configuration

Add environment variable HADOOP_HOME=F:\hadoop\hadoop-2.5.2\ 

Append the content of the environment variable path: %HADOOP_HOME%/bin 

Modify the version information of the compiled package and dependent packages 

修改F:\hadoop\hadoop2x-eclipse-plugin-master\ivy\libraries.properties 

hadoop.version=2.5.2 

jackson.version=1.9.13

ant compile 

F:\hadoop\hadoop2x-eclipse-plugin-master\src\contrib\eclipse-plugin> 

ant jar -Dversion=2.5.2 -Declipse.home=F:\eclipse\eclipse-hadoop\eclipse -Dhadoop.home=F:\hadoop\hadoop-2.5.2

After compiling, hadoop-eclipse-plugin-2.5.2.jar will be in the F:\hadoop\hadoop2x-eclipse-plugin-master\build\contrib\eclipse-plugin directory

 

  • eclipse environment configuration

Copy the compiled hadoop-eclipse-plugin-2.5.2.jar to the plugins directory of eclipse, then restart eclipse 

 

2. Open the menu Window--Preference--Hadoop Map/Reduce to configure, as shown in the following figure:



 Display the Hadoop connection configuration window: Window--Show View--Other-MapReduce Tools, as shown in the following figure:



 

Configure the connection to Hadoop, as shown in the following figure:



 Check whether the connection is successful, create a new folder and upload a file and you can see information similar to the following, indicating that the connection is successful:



 

  • Map/Reduce Project project creation

Right-click in the project bar and select new –-> other –> Map/Reduce Project



 

 Next, fill in the name of the MapReduce project as "WordCountProject" and click "finish" to complete.






 
 
  So far we have successfully created a MapReduce project, and we found that there are more projects we just created on the left side of the Eclipse software.

 

Create log4j.properties file to have log output in eclipse console

Create a log4j.properties file in the src directory with the following contents: 

log4j.rootLogger=debug,stdout,R 

log4j.appender.stdout=org.apache.log4j.ConsoleAppender 

log4j.appender.stdout.layout=org.apache.log4j.PatternLayout 

log4j.appender.stdout.layout.ConversionPattern=%5p - %m%n 

log4j.appender.R=org.apache.log4j.RollingFileAppender 

log4j.appender.R.File=mapreduce_test.log 

log4j.appender.R.MaxFileSize=1MB 

log4j.appender.R.MaxBackupIndex=1 

log4j.appender.R.layout=org.apache.log4j.PatternLayout 

log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n 

log4j.logger.com.codefutures=DEBUG



 Note: Put winutils.exe and hadoop.dll (it can be found in hadoop-common-2.2.0-bin as mentioned above) into the F:\hadoop\hadoop-2.5.2\bin directory

Create a new 228238 (PC user name) folder in the user directory of DFS Locations, and create a new input folder in this folder; the result after running is in the newout directory.



 Create a new class WordCount, package name: org.apache.hadoop.examples

 

public class WordCount {

 

  public static class TokenizerMapper

       extends Mapper<Object, Text, Text, IntWritable>{

     

    private final static IntWritable one = new IntWritable(1);

    private Text word = new Text();

public void map(Object key, Text value, Context context

                    ) throws IOException, InterruptedException {

      StringTokenizer itr = new StringTokenizer(value.toString());

      while (itr.hasMoreTokens()) {

        word.set(itr.nextToken());

        context.write(word, one);      }

    }

  }

public static class IntSumReducer extends

Reducer<Text, IntWritable, Text, IntWritable> {

private IntWritable result = new IntWritable();

 

public void reduce(Text key, Iterable<IntWritable> values, Context context)

throws IOException, InterruptedException {

int sum = 0;

for (IntWritable val : values) {

sum += val.get();

}

result.set(sum);

context.write(key, result);

}

}

 public static void main(String[] args) throws Exception {

// Initialize Configuration

    Configuration conf = new Configuration();

    conf.set("mapred.job.tracker", "192.168.68.84:9001");

    String[] ars=new String[]{"input","newout"};

// GenericOptionsParser class, which is used to explain common hadoop commands,

// And set the corresponding value for the Configuration object as needed. In fact, we don't often use it in normal development.

// Instead, let the class implement the Tool interface, and then use ToolRunner to run the program in the main function,

// And ToolRunner will call GenericOptionsParser internally

    String[] otherArgs = new GenericOptionsParser(conf, ars).getRemainingArgs();

// There must be two parameters when running the WordCount program, if not, it will report an error and exit

    if (otherArgs.length != 2) {

      System.err.println("Usage: wordcount  ");

      System.exit(2);

    }

// building a job

    Job job = new Job(conf, "word count");

// Load the prepared calculation program

    job.setJarByClass(WordCount.class);

// load map function

    job.setMapperClass(TokenizerMapper.class);

    job.setCombinerClass(IntSumReducer.class);

// Load the reduce function implementation class

    job.setReducerClass (IntSumReducer.class);

// Define the type of the output key/value

    job.setOutputKeyClass(Text.class);

    job.setOutputValueClass(IntWritable.class);

// Build the input data file

    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));

// Build the output data file

    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

// If the job runs successfully, our program will exit normally

    System.exit(job.waitForCompletion(true) ? 0 : 1);

  }

}

 

 

右击 Run AS  Run on hadoop


 

 

Debug debug run:

Modify the mapred-site.xml file and add the following configuration:

<property>  

  <name>mapred.child.java.opts</name>  

  <value>-agentlib:jdwp=transport=dt_socket,address=8883,server=y,suspend=y</value>  

</property>

Right-click the hadoop src project, right-click "Debug As", select "Debug Configurations", select "Remote Java Application", add a new test, enter the remote host ip and listening port, for example 8883, and then click the "Debug" button



 

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326988068&siteId=291194637