简介

Hue是一个开源的Apache Hadoop UI系统，最早是由Cloudera Desktop演化而来，由Cloudera贡献给开源社区，它是基于Python Web框架Django实现的。通过使用Hue我们可以在浏览器端的Web控制台上与Hadoop集群进行交互来分析处理数据，例如操作HDFS上的数据，运行MapReduce Job、Hive查询等等。

如果使用的是CDH平台，那么默认就已集成Hue服务了，但是HDP平台却是没有的。每次跑点小任务，查询什么的总是登陆服务器，那样极不方便，所以就干脆装上个Hue啦

环境

CentOS 7
HDP 2.5
Hue 3.12.0

安装Hue

从官网下载3.12.0版本：
http://gethue.com/hue-3-12-the-improved-editor-for-sql-developers-and-analysts-is-out/
不过国内好像挺难下的，正好我这里也提前下好了，已经上传到百度云：
https://pan.baidu.com/s/1cCifuu

将其放到服务器指定目录上，进行解压。
root@dell:/data/hue# tar -zxvf hue-3.12.0.tgz

安装依赖：

root@dell:~# yum install ant gcc gcc-c++ mysql-devel openssl-devel cyrus-sasl-devel cyrus-sasl cyrus-sasl-gssapi sqlite-devel openldap-devel libacl-devel libxml2-devel libxslt-devel mvn krb5-devel python-devel python-simplejson python-setuptools

编译安装hue：

root@dell:/data/hue# PREFIX=/usr/share make install
PREFIX指定安装路径，这个最好放在空间较大的分区。

安装的过程比较简单，主要是注意要提前配置好Maven，在编译的过程中需要很多的Jar包，会通过Maven进行下载。

配置Hue

编辑Hue安装路径中的desktop/conf/hue.ini文件

配置数据库

默认hue使用的是sqlite数据库，可以改为mysql
打开hue.ini文件，找到以下内容：

[[database]]
     # Database engine is typically one of:
     # postgresql_psycopg2, mysql, sqlite3 or oracle.
     #
     # Note that for sqlite3, 'name', below is a path to the filename. For other backends, it is the database name
     # Note for Oracle, options={"threaded":true} must be set in order to avoid crashes.
     # Note for Oracle, you can use the Oracle Service Name by setting "host=" and "port=" and then "name=<host>:<port>/<service_name>".
     # Note for MariaDB use the 'mysql' engine.
     ## engine=sqlite3  // 改为mysql
     ## host=  // mysql服务器主机名或者ip
     ## port=  // 3306
     ## user=  // 数据库连接用户名(推荐新建一个hue用户)
     ## password= // 连接用户密码
     # Execute this script to produce the database password. This will be used when 'password' is not set.
     ## password_script=/path/script
     ## name=desktop/desktop.db // 这里要改为数据库的名字，如：hue
     ## options={}
     # Database schema, to be used only when public schema is revoked in postgres
      ## schema=

将以上内容修改后，如下：

[[database]]
     # Database engine is typically one of:
     # postgresql_psycopg2, mysql, sqlite3 or oracle.
     #
     # Note that for sqlite3, 'name', below is a path to the filename. For other backends, it is the database name
     # Note for Oracle, options={"threaded":true} must be set in order to avoid crashes.
     # Note for Oracle, you can use the Oracle Service Name by setting "host=" and "port=" and then "name=<host>:<port>/<service_name>".
     # Note for MariaDB use the 'mysql' engine.
     engine=mysql // 改为mysql
     host=192.168.1.2  // mysql服务器主机名或者ip
     port=3306  // 3306
     user=hue  // 数据库连接用户名(推荐新建一个hue用户)
     password=lu123456 // 连接用户密码
     # Execute this script to produce the database password. This will be used when 'password' is not set.
     ## password_script=/path/script
     name=hue // 这里要改为数据库的名字，如：hue
     ## options={}
     # Database schema, to be used only when public schema is revoked in postgres
      ## schema=

配置完数据库后，还需要同步和迁移数据到我们指定的mysql数据库中。

root@dell:/data/hue/hue-3.12.0# build/env/bin/hue syncdb
root@dell:/data/hue/hue-3.12.0# build/env/bin/hue migrate

同步的过程中可能会提示创建admin用户，后面启动hue的登陆需要用到。

Hadoop配置

编辑hdfs-site.xml，添加以下属性：

<property>
   <name>dfs.webhdfs.enable</name>
   <value>true</value>
</property>

HDP默认应该是开启的，HDFS——>Configs——>Advanced——>General——>WebHDFS enabled

添加hue角色代理：
HDFS——>Configs——>Advanced——>Custom core-site
添加属性：
hadoop.proxyuser.hue.groups=*
hadoop.proxyuser.hue.hosts=*
如果不添加这个的话，是无法通过hue提交job的。

Hue 配置，编辑hue.ini，找到如下内容：

[hadoop]

    # Configuration for HDFS NameNode
    # ------------------------------------------------------------------------
    [[hdfs_clusters]]
      # HA support by using HttpFs

      [[[default]]]
        # Enter the filesystem uri
        fs_defaultfs=hdfs://e5:8020  // 配置NameNode节点
        # fs_defaultfs=hdfs://localhost:8020

        # NameNode logical name.
        ## logical_name=

        # Use WebHdfs/HttpFs as the communication mechanism.
        # Domain should be the NameNode or HttpFs host.
        # Default port is 14000 for HttpFs.
        webhdfs_url=http://e5:50070/webhdfs/v1  // 配置webhdfs地址

        # Change this if your HDFS cluster is Kerberos-secured
        security_enabled=false

        # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
        # have to be verified against certificate authority
       ## ssl_cert_ca_verify=True

       # Directory of the Hadoop configuration
      ## hadoop_conf_dir=$HADOOP_CONF_DIR when set or '/etc/hadoop/conf'
       hadoop_config_dir=/etc/hadoop/conf  // hadoop配置路径

  [[yarn_clusters]]

    [[[default]]]
      # Enter the host on which you are running the ResourceManager
      resourcemanager_host=e5
      // YARN——>Configs——>Advanced——>Advanced yarn-site
      // 查看yarn.resourcemanager.address属性

      # The port where the ResourceManager IPC listens on
      resourcemanager_port=8050

      # Whether to submit jobs to this cluster
      submit_to=True

      # Resource Manager logical name (required for HA)
      ## logical_name=

      # Change this if your YARN cluster is Kerberos-secured
      security_enabled=false

      # URL of the ResourceManager API
      resourcemanager_api_url=http://e5:8088

      # URL of the ProxyServer API
      proxy_api_url=http://e5:8088

      # URL of the HistoryServer API
      ## history_server_api_url=http://localhost:19888

      # URL of the Spark History Server
      ## spark_history_server_url=http://localhost:18088

      # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
      # have to be verified against certificate authority
      ## ssl_cert_ca_verify=True

    # HA support by specifying multiple clusters.
    # Redefine different properties there.
    # e.g.

    # [[[ha]]]  // HA集群高可用配置
      # Resource Manager logical name (required for HA)
      ## logical_name=my-rm-name

      # Un-comment to enable
      ## submit_to=True

      # URL of the ResourceManager API
      ## resourcemanager_api_url=http://localhost:8088

      # ...

  # Configuration for MapReduce (MR1)
  # ------------------------------------------------------------------------
  [[mapred_clusters]]

    [[[default]]]
      # Enter the host on which you are running the Hadoop JobTracker
      jobtracker_host=e5

      # The port where the JobTracker IPC listens on
      jobtracker_port=8050

      # JobTracker logical name for HA
      ## logical_name=

      # Thrift plug-in port for the JobTracker
      ## thrift_port=9290

      # Whether to submit jobs to this cluster
      submit_to=False

      # Change this if your MapReduce cluster is Kerberos-secured
      security_enabled=false

这里只是贴一下hadoop的配置，其他的服务如：Oozie、Sqoop等等也很简单如果需要用到的话还是需要进行相应配置的。

启动hue服务

启动hue服务，执行以下命令以开发调试模式启动：

# 0.0.0.0表示允许任何主机连接，如果不加这个的话，默认只运行127.0.0.1本机访问
root@dell:/data/hue/hue-3.12.0# build/env/bin/hue runserver 0.0.0.0:8888

初次启动成功后，在浏览器中打开：http://server-ip:8888就会跳转到hue的登陆界面，如果没有设置初始账号密码的话，默认就是admin/admin，如果通过上面的同步数据库，创建了admin用户的话，就使用那个用户名和密码登陆即可。
hue login

登陆也成功了，我们就可以将hue添加进系统的服务中，方面通过systemd来控制，这样启动、关闭、开机自启什么的也都很容易了。

我这里是基于CDH提供的hue脚本，然后稍微改动一下就可以拿过来用了。

/etc/init.d/hue

#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
#       /etc/rc.d/init.d/hue
#
#       Hue web server
#
# chkconfig: 2345 90 10
# description: Hue web server
# pidfile: /var/run/hue/supervisor.pid

. /etc/init.d/functions

LOCKFILE=/var/lock/subsys/hue
#DAEMON=/usr/lib/hue/build/env/bin/supervisor # Introduce the server's location here
DAEMON=/data/hue/hue-3.12.0/build/env/bin/supervisor # Introduce the server's location here
LOGDIR=/var/log/hue  # Log directory to use
PIDFILE=/var/run/hue/supervisor.pid
USER=hue
#EXEC=/usr/lib/hue/build/env/bin/python
EXEC=/data/hue/hue-3.12.0/build/env/bin/python
DAEMON_OPTS="-p $PIDFILE -l $LOGDIR -d"
HUE_SHUTDOWN_TIMEOUT=15

hue_start() {
        export PYTHON_EGG_CACHE='/tmp/.hue-python-eggs'
        #RE_REGISTER=/usr/lib/hue/.re_register
        RE_REGISTER=/data/hue/hue-3.12.0/app.reg
        if [ -e $RE_REGISTER ]; then
            # Do app_reg on upgraded apps. This is a workaround for DISTRO-11.
            # We can probably take it out after another release.
            DO="/sbin/runuser -s /bin/bash $USER -c"
            #APP_REG="/usr/lib/hue/tools/app_reg/app_reg.py"
            APP_REG="/data/hue/hue-3.12.0/tools/app_reg/app_reg.py"
            # Upgraded apps write their paths in the re_rgister file.
            RE_REG_LOG=/var/log/hue/hue_re_register.log

            # Make cwd somewhere that $USER can chdir into
            pushd / > /dev/null
            $DO "DESKTOP_LOG_DIR=$LOGDIR $EXEC $APP_REG --install $(cat $RE_REGISTER | xargs echo -n)  >> $RE_REG_LOG 2>&1"
            ok=$?
            popd > /dev/null
            if [ $ok -eq 0 ] ; then
                rm -f $RE_REGISTER
            else
                echo "Failed to register some apps: Details in $RE_REG_LOG"
            fi
        fi

        echo -n "Starting hue: "
        for dir in $(dirname $PIDFILE) $LOGDIR ${PYTHON_EGG_CACHE}
        do
            mkdir -p $dir
            chown -R $USER $dir
        done

        # Check if already running
        if [ -e $PIDFILE ] && checkpid $(cat $PIDFILE) ; then
            echo "already running"
            return 0
        fi
        # the supervisor itself will setuid down to $USER
        su -s /bin/bash $USER -c "$DAEMON $DAEMON_OPTS"
        ret=$?
        base=$(basename $0)
        if [ $ret -eq 0 ]; then
            sleep 5
            test -e $PIDFILE && checkpid $(cat $PIDFILE)
            ret=$?
        fi
        if [ $ret -eq 0 ]; then
            touch $LOCKFILE
            success $"$base startup"
        else
            failure $"$base startup"
        fi
        echo
        return $ret
}

hue_stop() {
        if [ ! -e $PIDFILE ]; then
            success "Hue is not running"
            return 0
        fi

        echo -n "Shutting down hue: "

        HUE_PID=`cat $PIDFILE 2>/dev/null`
        if [ -n "$HUE_PID" ]; then
          kill -TERM ${HUE_PID} &>/dev/null
          for i in `seq 1 ${HUE_SHUTDOWN_TIMEOUT}` ; do
            kill -0 ${HUE_PID} &>/dev/null || break
            sleep 1
          done
          kill -KILL ${HUE_PID} &>/dev/null
        fi
        echo
        rm -f $LOCKFILE $PIDFILE
        return 0
}

hue_restart() {
  hue_stop
  hue_start
}

case "$1" in
    start)
        hue_start
        ;;
    stop)
        hue_stop
        ;;
    status)
        status -p $PIDFILE supervisor
        ;;
    restart|reload)
        hue_restart
        ;;
    condrestart)
        [ -f $LOCKFILE ] && restart || :
        ;;
    *)
        echo "Usage: hue {start|stop|status|reload|restart|condrestart"
        exit 1
        ;;
esac
exit $?

主要改动以下几个变量的值即可：

DAEMON
EXEC

RE_REGISTER
APP_REG

注意需要将上面的脚本放到/etc/init.d/目录下面。
之后我们就可以通过systemctl命令进行服务的控制，如：

# 启动服务
# systemctl start hue

# 停止服务
# systemctl stop hue

# 开机自动启动服务
# systemctl enable hue

总结

hue的安装配置还是比较容易的，但是初次接触难免总是会出现各种各样的问题，如：python版本、数据库配置、大数据组件服务配置等等问题。遇到问题也不用慌，先看错误提示，然后再看错误日志，最后再百度google搜索，或者查阅官方文档，办法总是会有的。最后建议如果公司服务器配置比较好的话，还是上CDH吧，服务更健全，还有商业保证就算遇到了难以解决的问题，也可以提给Cloudera。。

HDP 2.5集成Hue

简介

环境