Airflow deployment (the whole network the most complete combination of graphic)

This article talk about Airflow deployment and re-deployment of the pit and solutions encountered

  • Preparing the Environment

 

 

 

 

  • Python Installation

 Python installation process you may encounter a variety of problems, the problems of Internet search solution are not the same, the most critical is the effect of basically nothing. I summed up the points during the installation, and then I execute the following procedure, be sure not to step down, and ensure that the environment must be clean, if the word appears already exists in the implementation of a step sure to delete off and then re-execute this step. (This is the lesson of blood and tears)

 

#python dependence
yum -y install zlib zlib-devel
yum -y install bzip2 bzip2-devel
yum -y install ncurses ncurses-devel
yum -y install readline readline-devel
yum -y install openssl openssl-devel
yum -y install openssl-static
yum -y install xz lzma xz-devel
yum -y install sqlite sqlite-devel
yum -y install gdbm gdbm-devel
yum -y install tk tk-devel
yum install gcc

# Install wget command
yum -y install wget
# Python using wget to download the source archive to the / root directory
wget -P / root HTTPS: // www.python.org/ftp/python/3.6.5/Python-3.6.5.tgz 
# extract the Python source code archive in the current directory
tar -zxvf Python-3.6.5.tgz
# After entering extracting file directory
cd /root/Python-3.6.5
# Detection and verification platform
./configure --with-ssl --prefix=/service/python3
# Compile Python source code
make
# Install Python
make install
# Backup the original Python soft connection
mv /usr/bin/python /usr/bin/python2.backup
# Create a new connection point to a soft Python3
ln -s /service/python3/bin/python3 /usr/bin/python
# Establishment of flexible connections pip
LN -s / Service / python3 // bin / PIP3 / usr / bin / pip 
### attention this time step execution may occur already exist, this time you want to / usr / bin / pip deleted before execution this step is
wrong: ln: failed to create symbolic link '/ usr / bin / pip': File exists

 

# View Python version
python - V
# Detection pip is available
pip
# Upgrade pip
pip install --upgrade pip
# Obtain the location where the command yum
whereis yum
#yum: /usr/bin/yum /etc/yum /etc/yum.conf /usr/share/man/man8/yum.8
# Yum file editing
vi /usr/bin/yum /etc/yum /etc/yum.conf /usr/share/man/man8/yum.8
# To enter the edit mode
i
# Modify the first line (see system version, centos7 correspondence 2. . 7 , corresponding to centos6 2. . 6 )
#before fixing:
#!/usr/bin/python
# Modified:
#!/usr/bin/python2.7
# Exit edit mode
esc
#save document
:wq
# Edit the following file in the above manner, modify the first line
/ usr / libexec / urlgrabber-ext -down 

careful not to make a mistake in editing the file yum This step must be cautious. Otherwise, when behind with yum command may be a problem, there is no case pack down

Error downloading packages:
python3-rpm-generators-6-2.el7.noarch: [Errno 5] [Errno 2] No such file or directory

 

 

 

 

 

 

 

 

 

Please enter a command of the pip after the above steps were completed to see if normal

As it is normal

 

 

 

  • MySQL Installation

mysql mounted RPM is a two way installation, the other is deployed tar package

This article gives simple deployment RPM

If unsuccessful or the like may be used to ensure successful installation disposable bag deployment TAR, TAR bag deployment reference

https://www.cnblogs.com/xuziyu/p/10353968.html

 

# Uninstall mariadb
rpm -qa | grep mariadb
rpm -e --nodeps mariadb-libs-5.5.52-1.el7.x86_64
#sudo rpm -e --nodeps mariadb-libs-5.5.52-1.el7.x86_64
rpm -qa | grep mariadb

 

# Mysql download the repo source
wget -P /root http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
#通过rpm安装
rpm -ivh mysql-community-release-el7-5.noarch.rpm
#安装mysql
yum install mysql-server
#授权
chown -R mysql:mysql /var/lib/mysql
#开启Mysql服务
service mysqld start
#用root用户连接登录mysql:
mysql -uroot
#重置mysql密码
use mysql;
update user set password=password('root') where user='root';
flush privileges;
#为Airflow建库、建用户
#建库:
create database airflow;
#建用户:
create user 'airflow'@'%' identified by 'airflow';
create user 'airflow'@'localhost' identified by 'airflow';
#为用户授权:
grant all on airflow.* to 'airflow'@'%';
grant all on airflow.* to 'root'@'%';
flush privileges;
exit;

 

 

官网:http://airflow.apache.org/

 

 

 

Airflow2019年1月成为了Apache的顶级项目,它是由Python编写的一个任务调度框架。

 

  • 接下来安装airflow

#设置临时环境变量
export SLUGIFY_USES_TEXT_UNIDECODE=yes
#添加编辑环境变量
vi /etc/profile
#在最后添加以下内容:
----
export PS1="[\u@\h \w]\$ "
#Python环境变量
export PYTHON_HOME=/service/python3
export PATH=$PATH:$PYTHON_HOME/bin
#Airflow环境变量
export AIRFLOW_HOME=/root/airflow
export SITE_AIRFLOW_HOME=/service/python3/lib/python3.6/site-packages/airflow
export PATH=$PATH:$SITE_AIRFLOW_HOME/bin
----
#生效环境变量
source /etc/profile
#安装apache-airflow并且指定1.10.0版本
pip install apache-airflow===1.10.0
(这一步你若能顺利执行下来,你就可以欢呼一会了,太难了)

 

 airflow会被安装到python3下的site-packages目录下,完整目录为:

${PYTHON_HOME}/lib/python3.6/site-packages/airflow
#绝对路径/service/python3/lib/python3.6/site-packages/airflow

 

执行airflow命令做初始化操作

 

 解决:参考博客:https://www.cnblogs.com/wang-li/p/7620483.html

                   https://blog.csdn.net/yingkongshi99/article/details/52658538

airflow
####
[2019-07-17 04:40:01,565] {__init__.py:51} INFO - Using executor SequentialExecutor
usage: airflow [-h]
               {backfill,list_tasks,clear,pause,unpause,trigger_dag,delete_dag,pool,variables,kerberos,render,run,initdb,list_dags,dag_state,task_failed_deps,task_state,serve_logs,test,webserver,resetdb,upgradedb,scheduler,worker,flower,version,connections,create_user}
               ...
airflow: error: the following arguments are required: subcommand
####
#到此,airflow会在刚刚的AIRFLOW_HOME目录下生成一些文件。当然,执行该命令时可能会报一些错误,可以不用理会!
#报错如下:
[2019-07-17 04:40:01,565] {__init__.py:51} INFO - Using executor SequentialExecutor
usage: airflow [-h]
               {backfill,list_tasks,clear,pause,unpause,trigger_dag,delete_dag,pool,variables,kerberos,render,run,initdb,list_dags,dag_state,task_failed_deps,task_state,serve_logs,test,webserver,resetdb,upgradedb,scheduler,worker,flower,version,connections,create_user}
               ...
airflow: error: the following arguments are required: subcommand
#生成的文件logs如下所示:
[root@test01 ~]$ cd airflow/
[root@test01 ~/airflow]$ ll
total 28
-rw-r--r--. 1 root root 20572 Jul 17 04:40 airflow.cfg
drwxr-xr-x. 3 root root    23 Jul 17 04:40 logs
-rw-r--r--. 1 root root  2299 Jul 17 04:40 unittests.cfg

 

#为airflow安装mysql模块
pip install 'apache-airflow[mysql]'

 

#出现报错:
    ERROR: Complete output from command python setup.py egg_info:
    ERROR: /bin/sh: mysql_config: command not found
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-dq81ujxt/mysqlclient/setup.py", line 16, in <module>
        metadata, options = get_config()
      File "/tmp/pip-install-dq81ujxt/mysqlclient/setup_posix.py", line 51, in get_config
        libs = mysql_config("libs")
      File "/tmp/pip-install-dq81ujxt/mysqlclient/setup_posix.py", line 29, in mysql_config
        raise EnvironmentError("%s not found" % (_mysql_config_path,))
    OSError: mysql_config not found
    ----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-dq81ujxt/mysqlclient/
#解决方案,查看是否有mysql_config文件
[root@test01 ~]$ find / -name mysql_config
#如果没有
[root@test01 ~]$ yum -y install mysql-devel
#安装完成后再次验证是否有mysql_config
find / -name mysql_config
#采用mysql作为airflow的元数据库
pip install mysqlclient
#安装MySQLdb
pip install MySQLdb
#报错不支持
Collecting MySQLdb
  ERROR: Could not find a version that satisfies the requirement MySQLdb (from versions: none)
ERROR: No matching distribution found for MySQLdb
#所以使用python-mysql
pip install pymysql
pip install cryptography
#避免之后产生错误
#airflow.exceptions.AirflowException: Could not create Fernet object: Incorrect padding
#需要修改airflow.cfg (默认位于~/airflow/)里的fernet_key
#修改方法
python -c "from cryptography.fernet import Fernet; 
print(Fernet.generate_key().decode())"
#这个命令生成一个key,复制这个key然后替换airflow.cfg文件里的fernet_key的值,
#可能出现报错
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'cryptography'
#处理方式:
pip install cryptography
#文件中进行fernet_key值修改
cd  ${AIRFLOW_HOME}
vi airflow.cfg
#查找fernet_net
/fernet_net
#编辑替换fernet值

 

 

 

 

 

#修改airflow.cfg文件中的sql_alchemy_conn配置
sql_alchemy_conn = mysql+mysqldb://airflow:airflow@localhost:3306/airflow
#保存文件
#为避免初始化数据库时有如下报错
#Global variable explicit_defaults_for_timestamp needs to be on (1) for mysql
#修改MySQL配置文件my.cnf
#查找my.cnf位置
mysql --help | grep my.cnf

 

 

 

#修改my.cnf
vi /etc/my.cnf
#在[mysqld]下面(一定不要写错地方)添加如下配置:
explicit_defaults_for_timestamp=true

 

 

 

#重启mysql服务使配置生效
service mysqld restart
#检查配置是否生效
mysql -uroot -proot
mysql> select @@global.explicit_defaults_for_timestamp;
+------------------------------------------+
| @@global.explicit_defaults_for_timestamp |
+------------------------------------------+
|                                        1 |
+------------------------------------------+
1 row in set (0.00 sec)

 

通过修改airflow.cfg调整配置

1.修改webserver地址

base_url = http://IP:8080

web_server_port = 8080

 

 

2.修改executor

 

 

3.时区

#修改airflow.cfg中

default_timezone = Asia/Shanghai

 

 

其他配置

#同时需要修改另外三个文件
#修改webserver页面上右上角展示的时间:
vi ${PYTHON_HOME}/lib/python3.6/site-packages/airflow/www/templates/admin/master.html
-----------------------------------
{% block tail_js %}
{{ super() }}
<script src="{{ url_for('static', filename='jqClock.min.js') }}" type="text/javascript"></script>
<script>
    x = new Date()
   // var UTCseconds = (x.getTime() + x.getTimezoneOffset()*60*1000);##修改的内容
    var UTCseconds = x.getTime();##修改的内容
    $("#clock").clock({

 

 

#修改webserver lastRun时间:
vi ${PYTHON_HOME}/lib/python3.6/site-packages/airflow/models.py
-----------------------------------
#在指定位置添加如下内容,可以借助get_last_dagrun定位
def utc2local(self,utc):
       import time
       epoch = time.mktime(utc.timetuple())
       offset = datetime.fromtimestamp(epoch) - datetime.utcfromtimestamp(epoch)
       return utc + offset

 

 

vi ${PYTHON_HOME}/lib/python3.6/site-packages/airflow/www/templates/airflow/dags.html
#在图中指定位置修改为如下内容
dag.utc2local(last_run.execution_date).strftime("%Y-%m-%d %H:%M")
dag.utc2local(last_run.start_date).strftime("%Y-%m-%d %H:%M")

 

 

 

4.添加用户认证(暂时不做这一步,还没懂)

在这里我们采用简单的password认证方式
#(1)安装password组件:
sudo pip install apache-airflow[password]
#(2)修改airflow.cfg配置文件:
[webserver]
authenticate = True
auth_backend = airflow.contrib.auth.backends.password_auth
#(3)编写python脚本用于添加用户账号:
#编写add_account.py文件:
import airflow
from airflow import models, settings
from airflow.contrib.auth.backends.password_auth import PasswordUser

user = PasswordUser(models.User())
user.username = 'airflow'
user.email = '[email protected]'
user.password = 'airflow'

session = settings.Session()
session.add(user)
session.commit()
session.close()
exit()
#执行add_account.py文件:
python add_account.py
#你会发现mysql元数据库表user中会多出来一条记录的。

5修改scheduler线程数控制并发量

parallelism = 32

6修改检测新DAG间隔

min_file_process_interval = 5
  • 初始化源数据库及启动组件

#初始化元数据库信息(其实也就是新建airflow依赖的表)
pip install celery
pip install apache-airflow['kubernetes']
airflow initdb 
#或者使用airflow resetdb

 

 

 

#准备操作
#关闭linux防火墙
systemctl stop firewalld.service
systemctl disable firewalld.service
#同时需要关闭windows防火墙
#数据库设置
mysql -uroot -proot
mysql> set password for 'root'@'localhost' =password('');
Query OK, 0 rows affected (0.00 sec)
mysql> grant all on airflow.* to 'airflow'@'%';
Query OK, 0 rows affected (0.00 sec)
mysql> grant all on airflow.* to 'root'@'%';
Query OK, 0 rows affected (0.01 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
mysql> exit;

#启动组件:
airflow webserver -D
#airflow scheduler -D
#airflow worker -D
#airflow flower -D

 

  • Web页面查看

#地址
192.168.150.1288085/admin/
#测试
可以选择airflow_db数据库简单查询进行测试
select * from log;

 

 

 

 

 

https://blog.csdn.net/yingkongshi99/article/details/52658538

Guess you like

Origin www.cnblogs.com/xuziyu/p/11696549.html