HDP 2.5 integrated Hue

Introduction

Hue is an open source Apache Hadoop UI system. It was first evolved from Cloudera Desktop and contributed to the open source community by Cloudera. It is implemented based on the Python web framework Django. By using Hue, we can interact with the Hadoop cluster on the browser-side web console to analyze and process data, such as manipulating data on HDFS, running MapReduce Job, Hive query, and so on.

If the CDH platform is used, the Hue service is integrated by default, but the HDP platform does not. Every time I run a small task, I always log in to the server for inquiries, which is extremely inconvenient, so I simply install Hue.

environment

  • Cent OS 7
  • HDP 2.5
  • Hue 3.12.0

Install Hue

Download version 3.12.0 from the official website:
http://gethue.com/hue-3-12-the-improved-editor-for-sql-developers-and-analysts-is-out/
But it seems to be very difficult in China, It happened that I also downloaded it in advance, and it has been uploaded to Baidu Cloud:
https://pan.baidu.com/s/1cCifuu

Put it in the directory specified by the server and decompress it.
root@dell:/data/hue# tar -zxvf hue-3.12.0.tgz

Install dependencies:

root@dell:~# yum install ant gcc gcc-c++ mysql-devel openssl-devel cyrus-sasl-devel cyrus-sasl cyrus-sasl-gssapi sqlite-devel openldap-devel libacl-devel libxml2-devel libxslt-devel mvn krb5-devel python-devel python-simplejson python-setuptools

Compile and install hue:

root@dell:/data/hue# PREFIX=/usr/share make install
PREFIX specifies the installation path, which is best placed in a partition with a larger space.

The installation process is relatively simple. The main thing is to configure Maven in advance. During the compilation process, a lot of Jar packages are needed, which will be downloaded through Maven.

Configure Hue

desktop/conf/hue.iniEdit the files in the Hue installation path

configuration database

The default hue uses the sqlite database, you can change to mysql to
open the hue.ini file and find the following:

[[database]]
     # Database engine is typically one of:
     # postgresql_psycopg2, mysql, sqlite3 or oracle.
     #
     # Note that for sqlite3, 'name', below is a path to the filename. For other backends, it is the database name
     # Note for Oracle, options={"threaded":true} must be set in order to avoid crashes.
     # Note for Oracle, you can use the Oracle Service Name by setting "host=" and "port=" and then "name=<host>:<port>/<service_name>".
     # Note for MariaDB use the 'mysql' engine.
     ## engine=sqlite3  // 改为mysql
     ## host=  // mysql服务器主机名或者ip
     ## port=  // 3306
     ## user=  // 数据库连接用户名(推荐新建一个hue用户)
     ## password= // 连接用户密码
     # Execute this script to produce the database password. This will be used when 'password' is not set.
     ## password_script=/path/script
     ## name=desktop/desktop.db // 这里要改为数据库的名字,如:hue
     ## options={}
     # Database schema, to be used only when public schema is revoked in postgres
      ## schema=

After modifying the above content, it is as follows:

[[database]]
     # Database engine is typically one of:
     # postgresql_psycopg2, mysql, sqlite3 or oracle.
     #
     # Note that for sqlite3, 'name', below is a path to the filename. For other backends, it is the database name
     # Note for Oracle, options={"threaded":true} must be set in order to avoid crashes.
     # Note for Oracle, you can use the Oracle Service Name by setting "host=" and "port=" and then "name=<host>:<port>/<service_name>".
     # Note for MariaDB use the 'mysql' engine.
     engine=mysql // 改为mysql
     host=192.168.1.2  // mysql服务器主机名或者ip
     port=3306  // 3306
     user=hue  // 数据库连接用户名(推荐新建一个hue用户)
     password=lu123456 // 连接用户密码
     # Execute this script to produce the database password. This will be used when 'password' is not set.
     ## password_script=/path/script
     name=hue // 这里要改为数据库的名字,如:hue
     ## options={}
     # Database schema, to be used only when public schema is revoked in postgres
      ## schema=

After configuring the database, you also need to synchronize and migrate data to the mysql database we specified.

root@dell:/data/hue/hue-3.12.0# build/env/bin/hue syncdb
root@dell:/data/hue/hue-3.12.0# build/env/bin/hue migrate

During the synchronization process, you may be prompted to create an admin user, which will be needed to start the hue login later.

Hadoop configuration

Edited hdfs-site.xml, adding the following properties:

<property>
   <name>dfs.webhdfs.enable</name>
   <value>true</value>
</property>

HDP should be enabled by default, HDFS——>Configs——>Advanced——>General——>WebHDFS enabled

Add hue role proxy: HDFS->Configs-
>Advanced->Custom core- siteAdd
property:
hadoop.proxyuser.hue.groups=*
hadoop.proxyuser.hue.hosts=*
If you don't add this, you can't Submit jobs through hue.

Hue configure , edit hue.ini, find the following:

[hadoop]

    # Configuration for HDFS NameNode
    # ------------------------------------------------------------------------
    [[hdfs_clusters]]
      # HA support by using HttpFs

      [[[default]]]
        # Enter the filesystem uri
        fs_defaultfs=hdfs://e5:8020  // 配置NameNode节点
        # fs_defaultfs=hdfs://localhost:8020

        # NameNode logical name.
        ## logical_name=

        # Use WebHdfs/HttpFs as the communication mechanism.
        # Domain should be the NameNode or HttpFs host.
        # Default port is 14000 for HttpFs.
        webhdfs_url=http://e5:50070/webhdfs/v1  // 配置webhdfs地址

        # Change this if your HDFS cluster is Kerberos-secured
        security_enabled=false

        # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
        # have to be verified against certificate authority
       ## ssl_cert_ca_verify=True

       # Directory of the Hadoop configuration
      ## hadoop_conf_dir=$HADOOP_CONF_DIR when set or '/etc/hadoop/conf'
       hadoop_config_dir=/etc/hadoop/conf  // hadoop配置路径

  [[yarn_clusters]]

    [[[default]]]
      # Enter the host on which you are running the ResourceManager
      resourcemanager_host=e5
      // YARN——>Configs——>Advanced——>Advanced yarn-site
      // 查看yarn.resourcemanager.address属性

      # The port where the ResourceManager IPC listens on
      resourcemanager_port=8050

      # Whether to submit jobs to this cluster
      submit_to=True

      # Resource Manager logical name (required for HA)
      ## logical_name=

      # Change this if your YARN cluster is Kerberos-secured
      security_enabled=false

      # URL of the ResourceManager API
      resourcemanager_api_url=http://e5:8088

      # URL of the ProxyServer API
      proxy_api_url=http://e5:8088

      # URL of the HistoryServer API
      ## history_server_api_url=http://localhost:19888

      # URL of the Spark History Server
      ## spark_history_server_url=http://localhost:18088

      # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
      # have to be verified against certificate authority
      ## ssl_cert_ca_verify=True

    # HA support by specifying multiple clusters.
    # Redefine different properties there.
    # e.g.

    # [[[ha]]]  // HA集群高可用配置
      # Resource Manager logical name (required for HA)
      ## logical_name=my-rm-name

      # Un-comment to enable
      ## submit_to=True

      # URL of the ResourceManager API
      ## resourcemanager_api_url=http://localhost:8088

      # ...

  # Configuration for MapReduce (MR1)
  # ------------------------------------------------------------------------
  [[mapred_clusters]]

    [[[default]]]
      # Enter the host on which you are running the Hadoop JobTracker
      jobtracker_host=e5

      # The port where the JobTracker IPC listens on
      jobtracker_port=8050

      # JobTracker logical name for HA
      ## logical_name=

      # Thrift plug-in port for the JobTracker
      ## thrift_port=9290

      # Whether to submit jobs to this cluster
      submit_to=False

      # Change this if your MapReduce cluster is Kerberos-secured
      security_enabled=false

Here is just the configuration of hadoop. Other services such as Oozie, Sqoop, etc. are also very simple. If you need to use them, you still need to configure them accordingly.

start hue service

Start the hue service and execute the following command to start in development and debug mode:

# 0.0.0.0表示允许任何主机连接,如果不加这个的话,默认只运行127.0.0.1本机访问
root@dell:/data/hue/hue-3.12.0# build/env/bin/hue runserver 0.0.0.0:8888

After the initial startup is successful, open it in the browser: http://server-ip:8888it will jump to the login interface of hue. If the initial account password is not set, the default is admin/admin. If the admin user is created through the above synchronization database, use Login with that username and password.
hue login

The login is also successful, and we can add hue into the system services, which are controlled by systemd , so that it is easy to start, shut down, and start automatically.

Here I am based on the hue script provided by CDH, and then I can use it with a little modification.

/etc/init.d/hue

#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
#       /etc/rc.d/init.d/hue
#
#       Hue web server
#
# chkconfig: 2345 90 10
# description: Hue web server
# pidfile: /var/run/hue/supervisor.pid

. /etc/init.d/functions

LOCKFILE=/var/lock/subsys/hue
#DAEMON=/usr/lib/hue/build/env/bin/supervisor # Introduce the server's location here
DAEMON=/data/hue/hue-3.12.0/build/env/bin/supervisor # Introduce the server's location here
LOGDIR=/var/log/hue  # Log directory to use
PIDFILE=/var/run/hue/supervisor.pid
USER=hue
#EXEC=/usr/lib/hue/build/env/bin/python
EXEC=/data/hue/hue-3.12.0/build/env/bin/python
DAEMON_OPTS="-p $PIDFILE -l $LOGDIR -d"
HUE_SHUTDOWN_TIMEOUT=15

hue_start() {
        export PYTHON_EGG_CACHE='/tmp/.hue-python-eggs'
        #RE_REGISTER=/usr/lib/hue/.re_register
        RE_REGISTER=/data/hue/hue-3.12.0/app.reg
        if [ -e $RE_REGISTER ]; then
            # Do app_reg on upgraded apps. This is a workaround for DISTRO-11.
            # We can probably take it out after another release.
            DO="/sbin/runuser -s /bin/bash $USER -c"
            #APP_REG="/usr/lib/hue/tools/app_reg/app_reg.py"
            APP_REG="/data/hue/hue-3.12.0/tools/app_reg/app_reg.py"
            # Upgraded apps write their paths in the re_rgister file.
            RE_REG_LOG=/var/log/hue/hue_re_register.log

            # Make cwd somewhere that $USER can chdir into
            pushd / > /dev/null
            $DO "DESKTOP_LOG_DIR=$LOGDIR $EXEC $APP_REG --install $(cat $RE_REGISTER | xargs echo -n)  >> $RE_REG_LOG 2>&1"
            ok=$?
            popd > /dev/null
            if [ $ok -eq 0 ] ; then
                rm -f $RE_REGISTER
            else
                echo "Failed to register some apps: Details in $RE_REG_LOG"
            fi
        fi

        echo -n "Starting hue: "
        for dir in $(dirname $PIDFILE) $LOGDIR ${PYTHON_EGG_CACHE}
        do
            mkdir -p $dir
            chown -R $USER $dir
        done

        # Check if already running
        if [ -e $PIDFILE ] && checkpid $(cat $PIDFILE) ; then
            echo "already running"
            return 0
        fi
        # the supervisor itself will setuid down to $USER
        su -s /bin/bash $USER -c "$DAEMON $DAEMON_OPTS"
        ret=$?
        base=$(basename $0)
        if [ $ret -eq 0 ]; then
            sleep 5
            test -e $PIDFILE && checkpid $(cat $PIDFILE)
            ret=$?
        fi
        if [ $ret -eq 0 ]; then
            touch $LOCKFILE
            success $"$base startup"
        else
            failure $"$base startup"
        fi
        echo
        return $ret
}

hue_stop() {
        if [ ! -e $PIDFILE ]; then
            success "Hue is not running"
            return 0
        fi

        echo -n "Shutting down hue: "

        HUE_PID=`cat $PIDFILE 2>/dev/null`
        if [ -n "$HUE_PID" ]; then
          kill -TERM ${HUE_PID} &>/dev/null
          for i in `seq 1 ${HUE_SHUTDOWN_TIMEOUT}` ; do
            kill -0 ${HUE_PID} &>/dev/null || break
            sleep 1
          done
          kill -KILL ${HUE_PID} &>/dev/null
        fi
        echo
        rm -f $LOCKFILE $PIDFILE
        return 0
}

hue_restart() {
  hue_stop
  hue_start
}

case "$1" in
    start)
        hue_start
        ;;
    stop)
        hue_stop
        ;;
    status)
        status -p $PIDFILE supervisor
        ;;
    restart|reload)
        hue_restart
        ;;
    condrestart)
        [ -f $LOCKFILE ] && restart || :
        ;;
    *)
        echo "Usage: hue {start|stop|status|reload|restart|condrestart"
        exit 1
        ;;
esac
exit $?

Mainly change the values ​​of the following variables:

DAEMON
EXEC

RE_REGISTER
APP_REG

Note that the above script needs to be placed /etc/init.d/under the directory.
After that, we can systemctlcontrol the service through commands, such as:

# 启动服务
# systemctl start hue

# 停止服务
# systemctl stop hue

# 开机自动启动服务
# systemctl enable hue

Summarize

The installation and configuration of hue is relatively easy, but the first contact will inevitably lead to various problems, such as: python version, database configuration, big data component service configuration and so on. Don't panic if you encounter a problem, read the error message first, then the error log, and finally Baidu google search, or check the official documentation, there are always solutions. Finally, it is suggested that if the company's server configuration is better, it is better to use CDH, the service is more complete, and there is commercial assurance that even if you encounter difficult problems, you can refer to Cloudera. .

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325957645&siteId=291194637
HDP