Install the java runtime environment
1. Information about the experimental machine:
[root@node2 ~]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root@node2 ~]# uname -r
3.10.0-327.el7.x86_6
2 .Configure the epel source and install openjdk
yum search java | grep -i JDK
yum install java-1.8.0-openjdk java-1.8.0-openjdk-devel
3. Set the JAVA_HOME environment variable
[root@node2 ~]# cat /etc/profile.d/java_home.sh
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64
export PATH=$PATH:$JAVA_HOME/
bin The configuration takes effect
source /etc/profile.d/java_home.sh or. /etc/profile.d/java_home.sh
4. Test whether java is installed and configured successfully
[root@node2 ~]# java -version
openjdk version "1.8.0_161"
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)
5. Create a java applet, compile and print hello world
[root@node2 ~]# cat helloworld.java
public class helloworld {
public static void main(String[] args){
System.out.println("hello wolrd!");
}
}
[root@node2 ~]# javac helloworld.java #The class file helloworld.class will appear after compilation
[root@node2 ~]# java helloworld
#Run hello wolrd!
- How to run .jar .war these java applications?
java -jar /path/to/*.jar [arg1] [arg2]
#############################################################################
Next, get to know hadoop official website: http://hadoop.apache.org/
What is Apache Hadoop?
The Apache™ Hadoop® project develops open source software for reliable, scalable distributed computing.
The Apache Hadoop software library is a framework that allows distributed processing of large data sets across clusters of computers using a simple programming model.
Designed to scale from a single server to thousands of machines, each providing local compute and storage.
Rather than relying on hardware to provide high availability, the library itself is designed to detect and handle failures at the application layer, providing a high-availability service over a set of computers, each of which can fail.
hadoop running in stand-alone mode
Download the binary package from the official website, extract it to the /usr/locl directory, create a soft link to hadoop in the same directory, and configure the PATH variable to make it effective
[jerry@node2 ~]$ cat /etc/profile.d/hadoop.sh
export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin
[root@node2 ~]# hadoop
Usage: hadoop [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS]
or hadoop [OPTIONS] CLASSNAME [CLASSNAME OPTIONS]
where CLASSNAME is a user-provided Java class
OPTIONS is none or any of:
buildpaths attempt to add class files from build tree
--config dir Hadoop config directory
--debug turn on shell script debug mode
--help usage information
hostnames list[,of,host,names] hosts to use in slave mode
hosts filename list of hosts to use in slave mode
loglevel level set the log4j level for this command
workers turn on worker mode
SUBCOMMAND is one of:
Admin Commands:
daemonlog get/set the log level for each daemon
Client Commands:
archive create a Hadoop archive
checknative check native Hadoop and compression libraries availability
classpath prints the class path needed to get the Hadoop jar and the required libraries
conftest validate configuration XML files
credential interact with credential providers
distch distributed metadata changer
distcp copy file or directories recursively
dtutil operations related to delegation tokens
envvars display computed Hadoop environment variables
fs run a generic filesystem user client
gridmix submit a mix of synthetic job, modeling a profiled from production load
jar <jar> run a jar file. NOTE: please use "yarn jar" to launch YARN applications, not this command.
jnipath prints the java.library.path
kdiag Diagnose Kerberos Problems
kerbname show auth_to_local principal conversion
key manage keys via the KeyProvider
rumenfolder scale a rumen input trace
rumentrace convert logs into a rumen trace
s3guard manage metadata on S3
trace view and modify Hadoop tracing settings
version print the version
Daemon Commands:
kms run KMS, the Key Management Server
SUBCOMMAND may print help when invoked w/o parameters or with -h.
The default configuration of Hadoop is to run in non-distributed mode, that is, a single Java process, which is convenient for debugging. You can execute the attached example WordCount to get a feel for Hadoop in action. Take the files in the input folder as input, count the occurrences of words that match the regular expression wo[az.]+, and output the results to the output folder.
If you need to run it again, you need to delete the output folder (as Hadoop does not overwrite the result file by default):
# cd /usr/local/hadoop/
# mkdir input
# cp etc/hadoop/*.xml input
# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar grep input output 'dfs[a-z.]+'
# cat output/*
1 work
[root@node2 /usr/local/hadoop]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar grep /etc/passwd output 'root'
[root@node2 /usr/local/hadoop]# cat output/*