Analysis and comparison of Ambari and ClouderManager

Chapter 1 Introduction

Anyone who has operated and maintained a Hadoop cluster should know that the Hadoop ecosystem is a very difficult process from installation, configuration, to post-operation and maintenance. Generally speaking, it may take a few days to install Hadoop, and it also requires several people to operate and maintain a small cluster. The purpose of the two systems, ambari and cloudera Manager, is to simplify the installation and configuration of the Hadoop ecological cluster, improve the efficiency of Hadoop operation and maintenance, and monitor the Hadoop cluster.

1.1. Overview of ClouderManager/ambari

1.1.1. Ambari

Ambari is Hortonworks' open source Hadoop platform management software. It focuses on helping people manage their own HDP clusters. It has basic functions such as Hadoop component installation, management, and operation and maintenance. It provides Web UI for visual cluster management, and simplifies the big data platform. installation and use difficulty.

1.1.2.    ClouderManager

Cloudera Manager is a product of cloudera company, which focuses on helping you manage your own CDH clusters. Cloudera Manager is a tool (software) with automatic cluster installation, centralized management, cluster monitoring, and alarm functions. The time is shortened within a few hours, and the operation and maintenance personnel are reduced from dozens to less than a few, which greatly improves the efficiency of cluster management.

1.2. Comparison of manual deployment of hadoop cluster and tool (ClouderManager/ambari) deployment:

1.2.1. Manual deployment:

(1)  Component selection: use apache hadoop component (native hadoop component)

(2)  Advantages:

a) Each component is completely open source and free.

b) Community resources are active.

c) Can deepen the understanding and mastery of each component.

(3)  Disadvantages:

a) Cluster deployment: Cluster deployment, installation, and configuration take a lot of time. Usually, a large number of configuration files are written according to the needs of the cluster and distributed to each node, which is prone to errors and inefficient.

b) Version management: The version management of components is relatively complex. In the Hadoop ecosystem, the selection and use of components, such as Hive, Mahout, Sqoop, Flume, Spark, impala, Oozie, etc., need to consider a lot of compatibility issues, version Whether it is compatible, whether the components have conflicts, whether the compilation can pass, etc. A lot of time is often wasted compiling components and resolving version conflicts.

c) Cluster operation and maintenance: For cluster monitoring, operation and maintenance, other third-party software, such as ganglia, nagois, etc., needs to be installed. The operation and maintenance is difficult and the operation and maintenance cost is relatively high.

1.2.2. Tool Deployment

(1)  Component selection: Ambari + HDP or Cloudera Manger + CDH

(2)  Advantages:

a) Based on the stable version of Apache Hadoop, the conflict between different versions of components has been resolved, and the latest bug fixes or feature patches have been applied to enhance compatibility, security, and stability.

b) Many parameters are optimized by default, such as snappy compression of HDFS.

c) It provides deployment, installation and configuration tools, which greatly improves the efficiency of cluster deployment, and the cluster can be deployed within a few hours.

d) Third-party distributions are usually verified by a large number of tests, there are many deployment instances, and a large number of them run into various production environments.

e) Simple operation and maintenance. It provides tools for management, monitoring, diagnosis, and configuration modification. It is convenient to manage and configure, locate problems quickly and accurately, and make operation and maintenance work simple and effective.

(3)  Disadvantages

a) The free community edition is not fully functional, and the non-community edition services are charged.

b) Charging standard: Charged by node, Cloudera costs $4,000 per node per year, and Hortonworks costs $1,250 per node per year.

Chapter 2 Ambari Overview

2.1 Introduction to Ambari

Ambari is Hortonworks' open source Hadoop platform management software. It has basic functions such as installation, management, and operation and maintenance of Hadoop components. It provides a Web UI for visual cluster management, which simplifies the installation and use of the big data platform.

Ambari currently connects and installs Hortonworks Data Platform (HDP), Hortonworks' open source Hadoop, and no instance is found for the production environment deployment of Apach's Hadoop platform; for those who have installed Apach Hadoop or other Hadoop platforms, Ambari may not be used for management;

2.2 ambari function list

2.2.1 Operation level:

(1) Host Level Action (machine level operation)

(2) Component Level Action (module-level operation)

2.2.2 Role-based user management, 5 roles:

(1)Cluster User

View cluster and service information, such as configuration, service status, and health status. Read-only

(2)Service Operator

Able to operate the service life cycle, such as start, stop, and also perform some operations such as Rebalance DataNode and YARN refresh

(4)  Service Administrator

On the basis of Service Operator, operations such as configuring service, moving NameNode, and enabling HA are added.

(5)  Cluster Operator

On the basis of Service Administrator, operations on hosts and components are added, such as adding, deleting, etc.

(6)  Cluster Administrator

The super administrator of the cluster has supreme rights and can operate any component.

2.2.3 Dashboard Monitoring

(1) Roll Start function. According to the dependencies of the Service, each Service is started in a certain order. For example, HBase relies on HDFS and Zookeeper. Ambari will start HDFS and Zookeeper first, and then start HBase.

(2) Key operation and maintenance indicators (metrics) - metrics means "measures, indicators"

(3) In the Service list on the left, the middle part is the module (Component) information of the Service, that is, which modules and the number of the Service. There is a button of Service Action in the upper right corner, including the operation of starting, stopping, and deleting the service.

(4) Quick links (direct component native management interface) 

2.3 Introduction to Alert

(1) Alert alarm level: 
OK, Warning, Critical, Unknown, None

(2) Alert alarm types: 
WEB, Port, Metric, Aggregate and Script

Table 1. Comparison of Alert types in Ambari

Types of

use

Alarm level

Whether the threshold is configurable

unit

PORT

Used to monitor whether a port on the machine is available

OK, WARN, CRIT

Yes

Second

METRIC

Used to monitor metric-related configuration properties

OK, WARN, CRIT

Yes

variable

AGGREGATE

Used to collect the status of some other Alerts

OK, WARN, CRIT

Yes

percentage

WEB

Used to monitor whether a WEB UI (URL) address is available

OK, WARN, CRIT

no

without

SCRIPT

Alert's monitoring logic is executed by a custom python script

OK, CRIT

no

without

2.4 Hadoop representative component function description

2.4.1 HDFS

  • Start, stop, restart HDFS, and also support HDFS deletion, provided that other services that depend on HDFS are deleted
  • Advanced configuration 
    supports advanced configuration of core-site.xml, hdfs-site.xml
  • Download configuration file
  • Status View 
    the health status of NameNode and SNameNode, as well as the node where they are located, hard disk usage, and block status (number of lost and conflicted)
  • The file viewing 
    is embedded with the native file directory viewing function of HDFS, and there is no one-click upload and download function.
  • Log viewing 
    Log viewing can be directed to the HDFS native log viewing Web UI interface in QuickLinks. The interface has not been optimized, and there are no auxiliary functions (such as retrieval) for log viewing.
  • Move NameNode, SNameNode
  • Rebalancing HDFS 
    makes blocks evenly distributed on DataNodes
  • NameNode UI 
    leads to HDFS native UI via QuickLinks
  • HA 
    one-click configuration of high availability of NameNode, using JournalNode and NFS as shared storage
  • Start, stop, restart Zookeeper cluster
  • Status View 
    the health status of Zookeeper Server and Client, the node where they are located
  • Advanced configuration 
    zoo.cfg, log output format (log4j configuration)
  • Add Zookeeper Server node
  • Download configuration file
  • Start HBase cluster, start RegionServer, stop cluster, delete HBase cluster
  • Add HBase Master node
  • Status View 
    the status of HBase Master and RegionServers and their nodes, master startup time, and average load (regions/regionsServer)
  • Advanced configuration 
    of HBase Master, RegionServer, Client memory limit, heartbeat time, etc. Kerberos can be enabled (provided the Service is installed), or Phoenix SQL can be enabled
  • Log viewing 
    Log viewing can be directed to the native log viewing Web UI interface in QuickLinks
  • The Master UI interface 
    leads to HDFS native UI through QuickLinks
  • Start, stop, restart of Kafka, restart of Brokers, delete of Service
  • Advanced configuration Configuration 
    of Kafka Broker, Producer, and Consumer. Broker supports connection parameter setting, topic configuration, log configuration, etc.
  • Status View 
    the status and node location of the Broker. Combined with Ambari Metrics, you can view more status, such as Topics, Controller, Replica

2.4.2 Zookeeper

2.4.3 HBase

2.4.5 Kafka

2.5 Ambari Summary

Ambari integrates Hadoop components through HDP, and provides a combination of services in the form of stacks. The main problems it solves are as follows:

  • The deployment process is simplified. The services supported in the HDP stack only need to be installed graphically, and the node where the master is located can be easily specified to make the cluster run quickly.
  • Monitor cluster status through Ambari Metrics, and display data (CPU, memory, load, etc.) by integrating Grafana
  • Advanced configuration of the Service. After the cluster is deployed, you can easily modify the parameters through the dashboard (such as the core-site of HDFS, etc.)
  • Quick Links. Ambari provides quick links to native management interfaces for Hadoop components
  • Node extensions. Such as the increase of HBase Master.
  • Customizable Alert functionality. Ambari's alarm information can be customized, so that users can set which situations need to be alarmed and which ones are not required according to their needs.
  • Value-added features. Such as Rebalance DataNode of HDFS, HA of NameNode, etc.
  • Ambari's own user management, based on RBAC, grants users the management rights to the Hadoop cluster.

Ambari does not integrate too many functions of Hadoop components, such as log analysis, etc., but only provides functions such as installation, configuration, start and stop, and tries to maintain isolation from native Hadoop components. Links lead directly to native management interfaces (such as the HBase Master UI), and its approach remains low-intrusive to Hadoop components.

Chapter 3 Overview of ClouderManager

3.1 Introduction to clouderManager

Cloudera Manager is a product of cloudera company. It focuses on helping you manage your own CDH cluster. Through the unified UI interface of Cloudera Manager, CDH and its related components can be quickly and automatically configured and deployed. At the same time, Cloudera Manager also provides a variety of rich options. Customized monitoring, diagnosis and reporting functions, unified log management function on the cluster, unified cluster configuration management and real-time configuration change function, multi-tenant function, high availability disaster recovery deployment function and automatic recovery function, etc. Maintain your own data center. It is subdivided into a free Express version and a full-featured paid version of Enterprise that offers numerous value-added services.

3.2 Cloudera manager function brief description:

Management: Manage big data clusters, such as adding and deleting nodes.

Monitoring: Monitor the health of the cluster, and comprehensively monitor the various indicators set and system operation.

Diagnosis: Diagnose the problems that occur in the cluster, and provide suggested solutions to the problems.

Integration: Integrate multiple components of hadoop.

3.3 Core Components of Cloudera Manager

At the core of Cloudera Manager is the management server, which hosts the web server and application logic of the management console, and is responsible for installing software, configuring, starting and stopping services, and managing the services running on the cluster.

Agent : Installed on each host. The agent is responsible for the process of starting and stopping, unpacking the configuration, triggering the device and monitoring the host.
Management Service : Consists of a set of services that perform various monitoring, alerting, and reporting functional roles.
Database : Stores configuration and monitoring information. Typically, multiple logical databases run on one or more database servers. For example, Cloudera's management server and monitoring roles use different logical databases.
Cloudera Repository : The software is managed by Cloudera in a distributed repository.
Clients : is the interface for interacting with the server:

Admin Console: Web-based user interface with administrators to manage clusters and Cloudera administration.
API : API for creating custom Cloudera Manager applications with developers.

 

Chapter 4 The native webUI of the hadoop ecosystem

4.1 Introduction to native webUI

The webUI interface of the Apache hadoop native component can view the running status, historical log, memory consumption, hard disk consumption, etc. of the component.

Compared to clouderManager and ambari:

(1) The native webUI cannot perform the start and stop operations of the service

(2) The webUI interface between each component is independent of each other

(3) Most functions are limited to read-only operations and cannot perform other operations

(4) It is difficult to uniformly monitor the operation of the entire cluster

Chapter 5 Comparison of Ambari and ClouderManager

 

components

Ambari

ClouderaManager

development company

Hortonworks

Cloudera Corporation

Deployment method

Ambari + HDP

Cloudera Eat + CDH

Production operation

relatively stable

very stable

usage

High market occupancy

High market occupancy

open source situation

The product is open source, and the service seems to be charged

There are free and commercial versions

maintain

Rely on the strength of the community

cloudera has done some custom development, self-maintenance or patching will be farther and farther away from the community

Configure Version Control and History

support

not support

Secondary development

support

not support

integrated

support

no (does not support redis, kylin, es)

Access control

ranger (relatively simple)

sentry (complex)

View customization

Support for creating your own views, adding custom services

not support

 

Chapter 6 Version Selection Analysis

When we decide whether to adopt a certain software for use in an open source environment, we usually need to consider the following factors:

(1) Whether it is open source software, that is, whether it is free.

(2) Whether there is a stable version, this general software official website will give instructions.

(3) Whether it has been verified by practice, this can be known by checking whether some larger companies have already used it in the production environment.

(4) Whether there is strong community support, when a problem occurs in the later stage, the solution can be quickly obtained through network resources such as community and forum.

Chapter 7 Ambari Supplementary Information

Ambari is a Hadoop distributed cluster configuration management tool, an open source project led by hortonworks. It has become a top-level project of the Apache Foundation and a right-hand man in the Hadoop operation and maintenance system, which has attracted the attention of the industry and academia.

Ambari does not adopt a new idea and architecture, nor does it complete a new revolution in software, but makes full use of some existing excellent open source software and skillfully combines them to make it possible in a distributed environment. Cluster service management capabilities, monitoring capabilities, and display capabilities. These excellent open source software are:

    • On the agent side, the puppet management node is used;
    • On the Web side, ember.js is used as the front-end MVC framework and NodeJS related tools, handlebars.js is used as the page rendering engine, and the Bootstrap framework is also used in CSS/HTML;
    • On the server side, Jetty, Spring, Jetty, JAX-RS, etc. are used;
    • At the same time, it utilizes the distributed monitoring capabilities of Ganglia and Nagios.

The Ambari architecture adopts the Server/Client mode, which is mainly composed of two parts: ambari-agent and ambari-server. ambari relies on other mature tools, for example, its ambari-server relies on python, and ambari-agent also relies on ruby, puppet, facter and other tools, and it also relies on some monitoring tools nagios and ganglia for monitoring cluster status. in:

  1. Puppet is a distributed cluster configuration management tool and a typical Server/Client mode. It can centrally manage the installation, configuration and deployment of distributed clusters. The main language is ruby.
  2. facter is a node resource collection library written in python, which is used to collect system information of nodes, such as OS information, host information, etc. Since ambari-agent is mainly written in python, node information can be collected well with facter.

1. Ambari system architecture

In addition to ambari-server and ambari-agent, ambari also provides a clear interface management monitoring page ambari-web, these pages are provided by ambari-server. ambari-server opens up REST APIs. These APIs are mainly divided into two categories, one of which provides management and monitoring services for ambari-web, and the other is used to interact with ambari-agent and accept the messages sent by ambari-agent to ambari-server. Heartbeat request. The following figure is the system architecture of Ambari. The master module accepts requests from API and Agent Interface to complete the centralized management and monitoring logic of ambari-server, and each agent node is only responsible for the state collection and maintenance of its node.

 

2. Internal architecture of Ambari-Agent

ambari-agent is a stateless one. Its function is mainly divided into two parts:

  1. Collect the information of the node where it is located and send a heartbeat report to the ambari-server;
  2. Handles the execution request of ambari-server.

So it has two kinds of queues:

  1. Message queue MessageQueue, or ResultQueue. Including node status information (including registration information) and execution result information, and sent to ambari-server through heartbeat after aggregation;
  2. Action queue ActionQueue. It is used to receive the status operation returned by ambari-server, and then the executor can sequentially call modules such as puppet or python script to complete the task.

 

3. Internal Architecture of Ambari-Server

ambari-server is stateful, it maintains its own finite state machine FSM. At the same time, these state machines are stored in the database, and the early database mainly uses postgres. As shown in the figure below, the server side mainly maintains three types of states:

  1. Live Cluster State: The existing state of the cluster, the state information reported by each node will change the state;
  2. Desired State: The state that the user wants the node to be in is that the user has performed a series of operations on the page and needs to change the state of some services, and these states have not yet taken effect on the node;
  3. Action State: The operation state is the request state when the state changes, and can also be regarded as an intermediate state, which can assist the transition from the Live Cluster State to the Desired State state.

 

The Heartbeat Handler module of Ambari-server is used to receive heartbeat requests from each agent (heartbeat requests mainly contain two types of information: node status information and returned operation results), and pass the node status information to the FSM state machine to maintain the node's status. status, and return the returned operation result information to Action Manager for further processing.

The Coordinator module can also be called API handler. After receiving the WEB-side operation request, it will check whether it meets the requirements. The stage planner is decomposed into a set of operations, and finally provided to the Action Manager to complete the execution operation.

Therefore, as can be seen from the above figure, the maintenance and changes of all status information of Ambari-Server will be recorded in the database, and the user will make some corresponding records in the database when the user performs some operations to change the service. Get the change history of the database.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324780726&siteId=291194637