184. About Big Data

1 Overview

1.1 Introduction to Big Data

"Big data" is a particularly large body mass, particularly complex set of data categories of data, and such data can not be set using traditional database tools or conventional software tools to capture, manage and process its contents. Big Data technology is complex data from a wide variety of types, the ability to quickly obtain valuable information. Suitable for big data technologies include massively parallel processing databases, data mining grids, distributed file systems, distributed databases, cloud computing platform and scalable storage system.

Apache's Hadoop project is reliable, scalable, open source, distributed computing software. The Apache Hadoop software library is constructed to allow a cluster calculated by a relatively simple programming model for the large data sets distributed computing framework.

Hadoop design ideas can be calculated from the single-node server schema extensions for thousands of computing clusters, each individual server provides independent local computing and storage capacity.

For high availability in terms of indicators, Hadoop software library itself can achieve high availability without the need to worry about the underlying hardware from the application layer through the high availability and fault detection debugging. It provides a high-availability computing cluster service is vital upper cluster for fault repair.

1.2 Introduction to Data Platform TV

First TV platform is based on data Ambari secondary development of distributed Hadoop cluster configuration management tool, the platform to build through the cluster setup wizard simplifies cluster supply. At the same time, he also has a monitoring component, called Ambari-Metrics, can be configured in advance of the operation and maintenance of key indicators (metrics), then collection services, hosting, and operational status information of the cluster, displayed through WEB ways. We can see the direct Hadoop Core (HDFS and MapReduce) and related projects (such as HBase, Hive and HCatalog) is healthy. Its user interface is very intuitive, the user can easily and effectively review the information and control cluster.

TV platform to support data visualization and analysis work and tasks performed better able to see the dependency and performance. Through a RESTful API to complete the monitoring information is exposed, integrates with existing operation and maintenance tools. Platform use Ganglia to collect metrics, alarm system with Nagios support.

Wherein Ambari is a distributed software architecture, mainly consists of two parts: Ambari Server and Ambari Agent, shown in Figure 1-1. Ambari Server reads the configuration file Stack and Service. When creating a cluster with Ambari, Ambari Server and Service Stack transfer configuration files and control scripts Service lifecycle to Ambari Agent. After the Agent to get the configuration file, it will download and install a common source in the package (Redhat, is to use yum service). After the installation is complete, Ambari Server will notify the Agent to start the Service. After Ambari Server Agent will send a command to periodically check the status of Service, reported to the Agent Server, and presented on Ambari the GUI, user-friendly learned various states of the cluster, and make the appropriate maintenance.
Here Insert Picture Description

2. Basic environment configuration

In an example assembly Hadoop two nodes distributed cluster, the system version is used herein Centos7, as follows:

CPU name RAM hard disk IP addresses Roles
master 8G 100G 10.0.0.131 Ambari-ServerAmbari-Agent
slave1 4G 100G 10.0.0.133 Ambari-Agent

2.1 Configuring the host name

# master
# hostnamectl set-hostname master
# hostname
master

# slave1
# hostnamectl set-hostname slave1
# hostname
slave1

2.2 modify the hosts file

master节点:
[root@master ~]# cat /etc/hosts
10.0.0.131	master.hadoop master
10.0.0.133	slave1.hadoop

slave1节点:
[root@slave1 ~]# cat /etc/hosts
10.0.0.131	master.hadoop
10.0.0.133	slave1.hadoop slave1

注意:主机名映射采用FQDN格式。

Source 2.3 yum modified

# master
将BigData-v2.2.iso挂在到/mnt目录下,将其中的ambari解压到/opt目录下,并在master节点配置ftp服务。 
注意:
因为安装大数据相关软件包时,可能会用到相关依赖软件包,所以需要配置Centos7 Yum源,这里可以采用IAAS中的Centos7 Yum源。
# master & slave1
# cd /etc/yum.repos.d/
# rm -vf *

配置Yum源
# vi ambari.repo 
[centos7]
baseurl=ftp://192.168.2.10/centos7/
( 注:具体的yum源根据真实环境配置,本次为实验室测试环境 )
gpgcheck=0
enabled=1
name=centos
[ambari]
name=ambari
baseurl=ftp://10.0.3.61/ambari
( 注:具体的yum源根据真实环境配置,本次为实验室测试环境 )
enabled=1
gpgcheck=0

# master
# yum -y install httpd/mnt/文件夹中HDP-2.6.1.0和HDP-UTILS-1.1.0.21两个文件夹拷贝到/var/www/html/目录下。启动httpd服务。
# systemctl enable httpd.service
# systemctl status httpd.service

2.4 Configuring ntp

master
[root@master]# yum install ntp -y
# vi /etc/ntp.conf
注释或者删除以下四行
server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server 2.centos.pool.ntp.org iburst
server 3.centos.pool.ntp.org iburst
添加以下两行
server 127.127.1.0 
fudge 127.127.1.0 stratum 10
#systemctl enable ntpd
#systemctl start  ntpd

slave1
[root@slave1 ~]# yum install ntpdate -y
[root@slave1 ~]# ntpdate master
[root@slave1 ~]# systemctl enable ntpdate

2.5 Configuring SSH

# master & slave1
检查2个节点是否可以通过无密钥相互访问,如果未配置,则进行SSH无密码公钥认证配置。如下:
# yum install openssh-clients
# ssh-keygen
# ssh-copy-id master.hadoop
# ssh-copy-id slave1.hadoop

ssh登录远程主机查看是否成功
# ssh master.hadoop
# exit
# ssh slave1.hadoop
# exit

2.6 Disabling Transparent Huge Pages

操作系统后台有一个叫做khugepaged的进程,它会一直扫描所有进程占用的内存,
在可能的情况下会把4kpage交换为Huge Pages,在这个过程中,
对于操作的内存的各种分配活动都需要各种内存锁,直接影响程序的内存访问性能,
并且,这个过程对于应用是透明的,在应用层面不可控制,对于专门为4k page优化的程序来说,
可能会造成随机的性能下降现象。
master & slave1

cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]
重启后失效,需要再次执行

2.7 JDK installation configuration

# master 
[root@master ~]# mkdir /usr/jdk64
[root@master ~]# tar -zxvf /mnt/jdk-8u77-linux-x64.tar.gz -C /usr/jdk64/

[root@master ~]# vim /etc/profile
export JAVA_HOME=/usr/jdk64/jdk1.8.0_77
export PATH=$JAVA_HOME/bin:$PATH
[root@master ~]# source /etc/profile
[root@master ~]# java -version
java version "1.8.0_77"
Java(TM) SE Runtime Environment (build 1.8.0_77-b03)
Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)

slave1
[root@slave1 ~]# mkdir /usr/jdk64
[root@master ~]# scp /mnt/jdk-8u77-linux-x64.tar.gz 10.0.0.133:/root/
[root@slave1 ~]# tar -zxvf jdk-8u77-linux-x64.tar.gz -C /usr/jdk64/
# vi /etc/profile
export JAVA_HOME=/usr/jdk64/jdk1.8.0_77
export PATH=$JAVA_HOME/bin:$PATH
# source /etc/profile
#java –version
java version "1.8.0_77"
Java(TM) SE Runtime Environment (build 1.8.0_77-b03)
Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)

3. Configure ambari-server

master
[root@master ~]# yum install -y ambari-server

3.1 Installing MariaDB database

master安装
# yum install  mariadb mariadb-server mysql-connector-java

启动服务
# systemctl enable mariadb
# systemctl start mariadb

配置MySQL
#mysql_secure_installation
按enter确认后设置数据库root密码,我们这里设置为“bigdata”
Remove anonymous users? [Y/n] y
Disallow root login remotely? [Y/n] n
Remove test database and access to it? [Y/n] y
Reload privilege tables now? [Y/n] y
创建ambari数据库
# mysql -uroot -pbigdata
MariaDB [(none)]> create database ambari;
MariaDB [(none)]> grant all privileges on ambari.* to 'ambari'@'localhost' identified by 'bigdata';
MariaDB [(none)]> grant all privileges on ambari.* to 'ambari'@'%' identified by 'bigdata';
MariaDB [(none)]> use ambari;
MariaDB [ambari]> source /var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql
MariaDB [ambari]> quit

3.2 Installation Configuration ambari-server

# master
[root@master ~]# vi /etc/profile
export buildNumber=2.6.0.0

[root@master ~]# ambari-server setup
WARNING: SELinux is set to 'permissive' mode and temporarily disabled.
OK to continue [y/n] (y)? 
Customize user account for ambari-server daemon [y/n] (n)? n
Checking JDK...
[1] Oracle JDK 1.8 + Java Cryptography Extension (JCE) Policy Files 8
[2] Oracle JDK 1.7 + Java Cryptography Extension (JCE) Policy Files 7
[3] Custom JDK
==============================================================================
Enter choice (1): 3
Path to JAVA_HOME: /usr/jdk64/jdk1.8.0_77
Validating JDK on Ambari Server...done.
Completing setup...
Configuring database... 
Enter advanced database configuration [y/n] (n)? y
Configuring database...
====================================================================
Choose one of the following options:
[1] - PostgreSQL (Embedded)
[2] - Oracle
[3] - MySQL
[4] - PostgreSQL
[5] - Microsoft SQL Server (Tech Preview)
[6] - SQL Anywhere
====================================================================
Enter choice (1): 3
Hostname (localhost): 
Port (3306): 
Database name (ambari): 
Username (ambari): 
Enter Database Password (bigdata): 
Proceed with configuring remote database connection properties [y/n] (y)? 
Ambari Server 'setup' completed successfully.

ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar

启动ambari-server服务
[root@master ~]# ambari-server start
Using python  /usr/bin/python
Starting ambari-server
Ambari Server running with administrator privileges.
Organizing resource files at /var/lib/ambari-server/resources...
Ambari database consistency check started...
Server PID at: /var/run/ambari-server/ambari-server.pid
Server out at: /var/log/ambari-server/ambari-server.out
Server log at: /var/log/ambari-server/ambari-server.log
Waiting for server start...................................
Server started listening on 8080

DB configs consistency check: no errors and warnings were found.
Ambari Server 'start' completed successfully.

登陆界面http://10.0.0.131:8080/
登录用户名密码为admin:admin

Here Insert Picture Description
Here Insert Picture Description

Configuring ambari-agent

master & slave1
# yum -y install ambari-agent

vi /etc/ambari-agent/conf/ambari-agent.ini
[server]
hostname= master.hadoop
ambari-agent restart
Published 186 original articles · won praise 72 · views 10000 +

Guess you like

Origin blog.csdn.net/chengyinwu/article/details/104084840