HADOOP environment build in Windows system - from scratch

Step 1: Install a linux virtual machine in the windows system

1. First install VMware Workstation, I use version 12.5.6 locally;

2. Download the centos6.3 image, CentOS-6.3-x86_64-bin-DVD1.iso;

3. Create a virtual machine

Step 2: Configure the network

1. Set the IP and other information of the local network card

2. Set up the virtual machine network

(1) Modify ifcfg-eth0 file information

The command is as follows:

The modification information is as follows:

Restart the network after modification: service network restart

Turn off the firewall: service iptables stop (temporary)

chkconfig iptables off (permanent)

3. Modify the hosts file

Add the ip address corresponding to the mergedata host

Step 3: Install jdk (root user)

1. Use ftp to upload the pre-configured files to the /bigdata folder

2. Unzip jdk: tar -vxf jdk-7u79-linux-x64.tar.gz

3. Establish a soft connection: ln -s /bigdata/jdk1.7.0_79 java

After creation, view the following:

4. Configure the environment variables of jdk

vi /etc/profile

Add the following code:

exportJAVA_HOME=/bigdata/java

export CLASSPATH=/bigdata/java/lib

export PATH=$JAVA_HOME/bin:$PATH

source/etc/profile

5. Check whether the jdk is installed successfully

Step 4: Create a new user

Su -root

adduser bigdata

passwd bigdata

cd /

chown -R bigdata:bigdata /bigdata/

Step 5: Set up SSH password-free login (bigdata user)

Enter under the user created by the port:

su - bigdata

Generate public and private keys:

ssh-keygen-t rsa

Import the public key into this machine:

cd~/.ssh

catid_rsa.pub > authorized_keys

Change permissions:

chmod600 authorized_keys

test:

ssh host, the first login may require yes to confirm, and then you can log in directly.

Step 6: Install hadoop (under bigdata account)

1. Unzip hadoop

Enter the /bigdata directory: cd /bigdata

Unzip hadoop: tar -vxf hadoop-2.6.0-cdh5.8.2.tar.gz

Unzip hive: tar -vxf hive-1.1.0-cdh5.8.2.tar.gz

Create hadoop soft link: ln -s hadoop-2.6.0-cdh5.8.2/hadoop

Create hive soft link: ln -s hive-1.1.0-cdh5.8.2/ hive

2. Configure environment variables

Edit profile file:

you ~ / .bash_profile

Add java and hadoop configuration information:

export JAVA_HOME=/bigdata/java

export PATH=$JAVA_HOME/bin:$PATH

export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

export HADOOP_HOME=/bigdata/hadoop

export PATH=$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$PATH

Configure hadoop in the /etc/profile file:

Switch to root user

vi /etc/profile

增加：export$HADOOP_HOME=/bigdata/hadoop

export PATH=$HADOOP_HOME/bin:$PATH

3. Modify hadoop related xml files

Go to the cd/bigdata/hadoop/etc/Hadoop directory

（1） vi core-site.xml

<name>fs.defaultFS</name>

<value>hdfs:// meritdata (hostname) :8020</value>

</property>

<name>hadoop.tmp.dir</name>

<value>/bigdata/hadoop/tmp</value>

</property>

<name>hadoop.proxyuser.$SERVER_USER.hosts</name>

</property>

<name>hadoop.proxyuser.$SERVER_USER.groups</name>

</property>

</configuration>

（2） vi hdfs-site.xml

<name>dfs.replication</name>

</property>

</configuration>

（3） vi mapred-site.xml

There is no mapred-site.xml file in the folder, so use the following command:

cp mapred-site.xml.template mapred-site.xml

<name>mapreduce.framework.name</name>

</property>

</configuration>

（4） vi yarn-site.xml

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<name>yarn.scheduler.minimum-allocation-mb</name>

</property>

<name>yarn.scheduler.maximum-allocation-mb</name>

</property>

<name>yarn.nodemanager.resource.memory-mb</name>

</property>

（5） vi hadoop-env.sh

#Licensed to the Apache Software Foundation (ASF) under one

# or morecontributor license agreements. See the NOTICEfile

#distributed with this work for additional information

#regarding copyright ownership. The ASFlicenses this file

# to youunder the Apache License, Version 2.0 (the

#"License"); you may not use this file except in compliance

# withthe License. You may obtain a copy ofthe License at

# http://www.apache.org/licenses/LICENSE-2.0

# Unlessrequired by applicable law or agreed to in writing, software

#distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUTWARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See theLicense for the specific language governing permissions and

#limitations under the License.

# SetHadoop-specific environment variables here.

# Theonly required environment variable is JAVA_HOME. All others are

#optional. When running a distributedconfiguration it is best to

# setJAVA_HOME in this file, so that it is correctly defined on

# remotenodes.

# Thejava implementation to use.

exportJAVA_HOME=/bigdata/jdk1.7.0_79

# Thejsvc implementation to use. Jsvc is required to run secure datanodes

# thatbind to privileged ports to provide authentication of data transfer

#protocol. Jsvc is not required if SASLis configured for authentication of

# datatransfer protocol using non-privileged ports.

#exportJSVC_HOME=${JSVC_HOME}

exportHADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}

# ExtraJava CLASSPATH elements. Automaticallyinsert capacity-scheduler.

for f in$HADOOP_HOME/contrib/capacity-scheduler/*.jar; do

if [ "$HADOOP_CLASSPATH" ]; then

exportHADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f

else

export HADOOP_CLASSPATH=$f

done

# Themaximum amount of heap to use, in MB. Default is 1000.

#exportHADOOP_HEAPSIZE=

#exportHADOOP_NAMENODE_INIT_HEAPSIZE=""

# ExtraJava runtime options. Empty by default.

exportHADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

# Commandspecific options appended to HADOOP_OPTS when specified

exportHADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS}-Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender}$HADOOP_NAMENODE_OPTS"

exportHADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS$HADOOP_DATANODE_OPTS"

exportHADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS}-Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender}$HADOOP_SECONDARYNAMENODE_OPTS"

exportHADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS"

exportHADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"

# Thefollowing applies to multiple commands (fs, dfs, fsck, distcp etc)

exportHADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"

#HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData$HADOOP_JAVA_PLATFORM_OPTS"

# Onsecure datanodes, user to run the datanode as after dropping privileges.

# This**MUST** be uncommented to enable secure HDFS if using privileged ports

# toprovide authentication of data transfer protocol. This **MUST NOT** be

# definedif SASL is configured for authentication of data transfer protocol

# usingnon-privileged ports.

exportHADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER}

# Wherelog files are stored. $HADOOP_HOME/logsby default.

#exportHADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER

# Wherelog files are stored in the secure data environment.

exportHADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}

###

# HDFSMover specific parameters

###

# Specifythe JVM options to be used when starting the HDFS Mover.

# Theseoptions will be appended to the options specified as HADOOP_OPTS

# andtherefore may override any similar flags set in HADOOP_OPTS

# exportHADOOP_MOVER_OPTS=""

###

#Advanced Users Only!

###

# Thedirectory where pid files are stored. /tmp by default.

# NOTE:this should be set to a directory that can only be written to by

# the user that will run the hadoopdaemons. Otherwise there is the

# potential for a symlink attack.

exportHADOOP_PID_DIR=${HADOOP_PID_DIR}

exportHADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}

# Astring representing this instance of hadoop. $USER by default.

export HADOOP_IDENT_STRING=$USER

4. File formatting

/ bigdata / hadoop / bin / hdfs purpose-format

5. Start hadoop

sbin/start-all.sh

6. You can use hdfs to create folders

Create a folder: hadoop fs –mkdir /user

Check if the folder was created successfully: hadoop fs –ls /

Monitor hadoop:

http://192.168.0.105:50070

http://192.168.0.105:8088/cluster

Step 7: Install mysql (under root user)

1. Check whether there is mysql installed by default in the system

Case-insensitive view: rpm -qa |grep -i mysql

Delete existing: rpm -eMySQL-devel-5.5.20-1.el6.x86_64 --nodeps

2. Unzip mysql

tar -vxf MySQL-5.5.20-1.el6.x86_64.tar

3. Install mysql by rpm

rpm -ivhMySQL-client-5.5.20-1.el6.x86_64.rpm

rpm -ivhMySQL-devel-5.5.20-1.el6.x86_64.rpm

rpm -ivhMySQL-embedded-5.5.20-1.el6.x86_64.rpm

rpm -ivh MySQL-server-5.5.20-1.el6.x86_64.rpm

rpm -ivh MySQL-shared-5.5.20-1.el6.x86_64.rpm

rpm -ivh MySQL-test-5.5.20-1.el6.x86_64.rpm

4、cp /usr/share/mysql/my-medium.cnf /etc/my.cnf

5 、 vi /etc/my.cnf

[client]

default-character-set=utf8

[mysqld]

character-set-server=utf8

lower_case_table_names=1

max_connections=1000

max_allowed_packet = 100M

[mysql]

default-character-set=utf8

Note: The yellow part is for additional information

6. Put the mysql-connector-java-5.1.41-bin.jar driver file in the lib folder of hive

7. Start the mysql service

servicemysql start

8. Create a password for the root user of mysql

Create password: mysqladmin -uroot password 'root123'

View encoding: showvariables like 'characte%';

9. Create hive database and user

Create hive database: createdatabase hive default character set latin1;

Create hive user: createuser hive identified by 'hive';

Authorize the hive user: grantall privileges on hive.* to 'hive'@'meritdata' identified by 'hive';

grant all privileges on hive.* to 'hive'@'%' identified by 'hive';

Refresh: flush privileges;

Step 8: Configure hive

1. Configure the hive-site.xml file

cd /bigdata/hive/conf

vi hive-site.xml

code show as below:

<name>datanucleus.autoCreateTables</name>

</property>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://meritdata:3306/hive?createDatabaseIfNotExist=true</value>

</property>

<name>javax.jdo.option.ConnectionUserName</name>

</property>

<name>javax.jdo.option.ConnectionPassword</name>

</property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

</property>

<name>hive.metastore.uris</name>

<value>thrift://meritdata:9083</value>

</property>

<name>hive.metastore.warehouse.dir</name>

</property>

<name>hive.server2.thrift.bind.host</name>

<value>meritdata</value>

</property>

<name>hive.server2.thrift.port</name>

</property>

</configuration>

2. Configure environment variables

you ~ / .bash_profile

Add code:

exportHIVE_HOME=/bigdata/hive

export PATH=$HIVE_HOME/bin:$PATH

export HIVE_CONF=$HIVE_HOME/conf

export HCAT_HOME=$HIVE_HOME/hcatalog

source~/.bash_profile

3. Start hive

Create the logs folder: mkdir /bigdata/hive/logs

Start hive:

nohup hive --service metastore> /bigdata/hive/logs/metastore.log 2>&1 &

nohup hive --servicehiveserver2 > /bigdata/hive/logs/hiveserver2.log 2>&1 &

4. Execute hive test

The operation is basically the same as the mysql operation.