Introduction
Hue is an open source Apache Hadoop UI system. It was first evolved from Cloudera Desktop and contributed to the open source community by Cloudera. It is implemented based on the Python web framework Django. By using Hue, we can interact with the Hadoop cluster on the browser-side web console to analyze and process data, such as manipulating data on HDFS, running MapReduce Job, Hive query, and so on.
If the CDH platform is used, the Hue service is integrated by default, but the HDP platform does not. Every time I run a small task, I always log in to the server for inquiries, which is extremely inconvenient, so I simply install Hue.
environment
- Cent OS 7
- HDP 2.5
- Hue 3.12.0
Install Hue
Download version 3.12.0 from the official website:
http://gethue.com/hue-3-12-the-improved-editor-for-sql-developers-and-analysts-is-out/
But it seems to be very difficult in China, It happened that I also downloaded it in advance, and it has been uploaded to Baidu Cloud:
https://pan.baidu.com/s/1cCifuu
Put it in the directory specified by the server and decompress it.
root@dell:/data/hue# tar -zxvf hue-3.12.0.tgz
Install dependencies:
root@dell:~# yum install ant gcc gcc-c++ mysql-devel openssl-devel cyrus-sasl-devel cyrus-sasl cyrus-sasl-gssapi sqlite-devel openldap-devel libacl-devel libxml2-devel libxslt-devel mvn krb5-devel python-devel python-simplejson python-setuptools
Compile and install hue:
root@dell:/data/hue# PREFIX=/usr/share make install
PREFIX specifies the installation path, which is best placed in a partition with a larger space.
The installation process is relatively simple. The main thing is to configure Maven in advance. During the compilation process, a lot of Jar packages are needed, which will be downloaded through Maven.
Configure Hue
desktop/conf/hue.ini
Edit the files in the Hue installation path
configuration database
The default hue uses the sqlite database, you can change to mysql to
open the hue.ini file and find the following:
[[database]]
# Database engine is typically one of:
# postgresql_psycopg2, mysql, sqlite3 or oracle.
#
# Note that for sqlite3, 'name', below is a path to the filename. For other backends, it is the database name
# Note for Oracle, options={"threaded":true} must be set in order to avoid crashes.
# Note for Oracle, you can use the Oracle Service Name by setting "host=" and "port=" and then "name=<host>:<port>/<service_name>".
# Note for MariaDB use the 'mysql' engine.
## engine=sqlite3 // 改为mysql
## host= // mysql服务器主机名或者ip
## port= // 3306
## user= // 数据库连接用户名(推荐新建一个hue用户)
## password= // 连接用户密码
# Execute this script to produce the database password. This will be used when 'password' is not set.
## password_script=/path/script
## name=desktop/desktop.db // 这里要改为数据库的名字,如:hue
## options={}
# Database schema, to be used only when public schema is revoked in postgres
## schema=
After modifying the above content, it is as follows:
[[database]]
# Database engine is typically one of:
# postgresql_psycopg2, mysql, sqlite3 or oracle.
#
# Note that for sqlite3, 'name', below is a path to the filename. For other backends, it is the database name
# Note for Oracle, options={"threaded":true} must be set in order to avoid crashes.
# Note for Oracle, you can use the Oracle Service Name by setting "host=" and "port=" and then "name=<host>:<port>/<service_name>".
# Note for MariaDB use the 'mysql' engine.
engine=mysql // 改为mysql
host=192.168.1.2 // mysql服务器主机名或者ip
port=3306 // 3306
user=hue // 数据库连接用户名(推荐新建一个hue用户)
password=lu123456 // 连接用户密码
# Execute this script to produce the database password. This will be used when 'password' is not set.
## password_script=/path/script
name=hue // 这里要改为数据库的名字,如:hue
## options={}
# Database schema, to be used only when public schema is revoked in postgres
## schema=
After configuring the database, you also need to synchronize and migrate data to the mysql database we specified.
root@dell:/data/hue/hue-3.12.0# build/env/bin/hue syncdb
root@dell:/data/hue/hue-3.12.0# build/env/bin/hue migrate
During the synchronization process, you may be prompted to create an admin user, which will be needed to start the hue login later.
Hadoop configuration
Edited hdfs-site.xml
, adding the following properties:
<property>
<name>dfs.webhdfs.enable</name>
<value>true</value>
</property>
HDP should be enabled by default, HDFS——>Configs——>Advanced——>General——>WebHDFS enabled
Add hue role proxy: HDFS->Configs-
>Advanced->Custom core- siteAdd
property:
hadoop.proxyuser.hue.groups=*
hadoop.proxyuser.hue.hosts=*
If you don't add this, you can't Submit jobs through hue.
Hue configure , edit hue.ini
, find the following:
[hadoop]
# Configuration for HDFS NameNode
# ------------------------------------------------------------------------
[[hdfs_clusters]]
# HA support by using HttpFs
[[[default]]]
# Enter the filesystem uri
fs_defaultfs=hdfs://e5:8020 // 配置NameNode节点
# fs_defaultfs=hdfs://localhost:8020
# NameNode logical name.
## logical_name=
# Use WebHdfs/HttpFs as the communication mechanism.
# Domain should be the NameNode or HttpFs host.
# Default port is 14000 for HttpFs.
webhdfs_url=http://e5:50070/webhdfs/v1 // 配置webhdfs地址
# Change this if your HDFS cluster is Kerberos-secured
security_enabled=false
# In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
# have to be verified against certificate authority
## ssl_cert_ca_verify=True
# Directory of the Hadoop configuration
## hadoop_conf_dir=$HADOOP_CONF_DIR when set or '/etc/hadoop/conf'
hadoop_config_dir=/etc/hadoop/conf // hadoop配置路径
[[yarn_clusters]]
[[[default]]]
# Enter the host on which you are running the ResourceManager
resourcemanager_host=e5
// YARN——>Configs——>Advanced——>Advanced yarn-site
// 查看yarn.resourcemanager.address属性
# The port where the ResourceManager IPC listens on
resourcemanager_port=8050
# Whether to submit jobs to this cluster
submit_to=True
# Resource Manager logical name (required for HA)
## logical_name=
# Change this if your YARN cluster is Kerberos-secured
security_enabled=false
# URL of the ResourceManager API
resourcemanager_api_url=http://e5:8088
# URL of the ProxyServer API
proxy_api_url=http://e5:8088
# URL of the HistoryServer API
## history_server_api_url=http://localhost:19888
# URL of the Spark History Server
## spark_history_server_url=http://localhost:18088
# In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
# have to be verified against certificate authority
## ssl_cert_ca_verify=True
# HA support by specifying multiple clusters.
# Redefine different properties there.
# e.g.
# [[[ha]]] // HA集群高可用配置
# Resource Manager logical name (required for HA)
## logical_name=my-rm-name
# Un-comment to enable
## submit_to=True
# URL of the ResourceManager API
## resourcemanager_api_url=http://localhost:8088
# ...
# Configuration for MapReduce (MR1)
# ------------------------------------------------------------------------
[[mapred_clusters]]
[[[default]]]
# Enter the host on which you are running the Hadoop JobTracker
jobtracker_host=e5
# The port where the JobTracker IPC listens on
jobtracker_port=8050
# JobTracker logical name for HA
## logical_name=
# Thrift plug-in port for the JobTracker
## thrift_port=9290
# Whether to submit jobs to this cluster
submit_to=False
# Change this if your MapReduce cluster is Kerberos-secured
security_enabled=false
Here is just the configuration of hadoop. Other services such as Oozie, Sqoop, etc. are also very simple. If you need to use them, you still need to configure them accordingly.
start hue service
Start the hue service and execute the following command to start in development and debug mode:
# 0.0.0.0表示允许任何主机连接,如果不加这个的话,默认只运行127.0.0.1本机访问
root@dell:/data/hue/hue-3.12.0# build/env/bin/hue runserver 0.0.0.0:8888
After the initial startup is successful, open it in the browser: http://server-ip:8888
it will jump to the login interface of hue. If the initial account password is not set, the default is admin/admin. If the admin user is created through the above synchronization database, use Login with that username and password.
The login is also successful, and we can add hue into the system services, which are controlled by systemd , so that it is easy to start, shut down, and start automatically.
Here I am based on the hue script provided by CDH, and then I can use it with a little modification.
/etc/init.d/hue
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# /etc/rc.d/init.d/hue
#
# Hue web server
#
# chkconfig: 2345 90 10
# description: Hue web server
# pidfile: /var/run/hue/supervisor.pid
. /etc/init.d/functions
LOCKFILE=/var/lock/subsys/hue
#DAEMON=/usr/lib/hue/build/env/bin/supervisor # Introduce the server's location here
DAEMON=/data/hue/hue-3.12.0/build/env/bin/supervisor # Introduce the server's location here
LOGDIR=/var/log/hue # Log directory to use
PIDFILE=/var/run/hue/supervisor.pid
USER=hue
#EXEC=/usr/lib/hue/build/env/bin/python
EXEC=/data/hue/hue-3.12.0/build/env/bin/python
DAEMON_OPTS="-p $PIDFILE -l $LOGDIR -d"
HUE_SHUTDOWN_TIMEOUT=15
hue_start() {
export PYTHON_EGG_CACHE='/tmp/.hue-python-eggs'
#RE_REGISTER=/usr/lib/hue/.re_register
RE_REGISTER=/data/hue/hue-3.12.0/app.reg
if [ -e $RE_REGISTER ]; then
# Do app_reg on upgraded apps. This is a workaround for DISTRO-11.
# We can probably take it out after another release.
DO="/sbin/runuser -s /bin/bash $USER -c"
#APP_REG="/usr/lib/hue/tools/app_reg/app_reg.py"
APP_REG="/data/hue/hue-3.12.0/tools/app_reg/app_reg.py"
# Upgraded apps write their paths in the re_rgister file.
RE_REG_LOG=/var/log/hue/hue_re_register.log
# Make cwd somewhere that $USER can chdir into
pushd / > /dev/null
$DO "DESKTOP_LOG_DIR=$LOGDIR $EXEC $APP_REG --install $(cat $RE_REGISTER | xargs echo -n) >> $RE_REG_LOG 2>&1"
ok=$?
popd > /dev/null
if [ $ok -eq 0 ] ; then
rm -f $RE_REGISTER
else
echo "Failed to register some apps: Details in $RE_REG_LOG"
fi
fi
echo -n "Starting hue: "
for dir in $(dirname $PIDFILE) $LOGDIR ${PYTHON_EGG_CACHE}
do
mkdir -p $dir
chown -R $USER $dir
done
# Check if already running
if [ -e $PIDFILE ] && checkpid $(cat $PIDFILE) ; then
echo "already running"
return 0
fi
# the supervisor itself will setuid down to $USER
su -s /bin/bash $USER -c "$DAEMON $DAEMON_OPTS"
ret=$?
base=$(basename $0)
if [ $ret -eq 0 ]; then
sleep 5
test -e $PIDFILE && checkpid $(cat $PIDFILE)
ret=$?
fi
if [ $ret -eq 0 ]; then
touch $LOCKFILE
success $"$base startup"
else
failure $"$base startup"
fi
echo
return $ret
}
hue_stop() {
if [ ! -e $PIDFILE ]; then
success "Hue is not running"
return 0
fi
echo -n "Shutting down hue: "
HUE_PID=`cat $PIDFILE 2>/dev/null`
if [ -n "$HUE_PID" ]; then
kill -TERM ${HUE_PID} &>/dev/null
for i in `seq 1 ${HUE_SHUTDOWN_TIMEOUT}` ; do
kill -0 ${HUE_PID} &>/dev/null || break
sleep 1
done
kill -KILL ${HUE_PID} &>/dev/null
fi
echo
rm -f $LOCKFILE $PIDFILE
return 0
}
hue_restart() {
hue_stop
hue_start
}
case "$1" in
start)
hue_start
;;
stop)
hue_stop
;;
status)
status -p $PIDFILE supervisor
;;
restart|reload)
hue_restart
;;
condrestart)
[ -f $LOCKFILE ] && restart || :
;;
*)
echo "Usage: hue {start|stop|status|reload|restart|condrestart"
exit 1
;;
esac
exit $?
Mainly change the values of the following variables:
DAEMON
EXEC
RE_REGISTER
APP_REG
Note that the above script needs to be placed /etc/init.d/
under the directory.
After that, we can systemctl
control the service through commands, such as:
# 启动服务
# systemctl start hue
# 停止服务
# systemctl stop hue
# 开机自动启动服务
# systemctl enable hue
Summarize
The installation and configuration of hue is relatively easy, but the first contact will inevitably lead to various problems, such as: python version, database configuration, big data component service configuration and so on. Don't panic if you encounter a problem, read the error message first, then the error log, and finally Baidu google search, or check the official documentation, there are always solutions. Finally, it is suggested that if the company's server configuration is better, it is better to use CDH, the service is more complete, and there is commercial assurance that even if you encounter difficult problems, you can refer to Cloudera. .