当前位置:网站首页>Detailed construction process of airflow (personal test + summary)
Detailed construction process of airflow (personal test + summary)
2022-07-22 08:22:00 【Good skin】
Airflow Detailed construction process ( Personal test + summary )
Official website :Apache airflow
Airflow It is a platform created by the community , Used to write... Programmatically , Schedule and monitor workflows .
I took a ride half a month ago , Now make a note , Don't push , Started to build , Whole process multi map :
Environmental preparation
System : cent os 7
conda edition : 4.8.2
airflow edition 1.10.11
Begin to build
I will use one conda
Create a apache airflow
Environment :
conda create -n airflow_env python=3.7
Switch to the current environment :
conda activate airflow_env
build airflow
build airflow Words , The official website has a set of detailed documents Airflow build
Then follow this way
# airflow needs a home, ~/airflow is the default,
# but you can lay foundation somewhere else if you prefer
# (optional)
export AIRFLOW_HOME=~/airflow
# install from pypi using pip
pip install apache-airflow
# initialize the database
airflow initdb
# start the web server, default port is 8080
# airflow webserver -p 8080 Make a change here , Followed by -D Parameters let it run in the background
airflow webserver -p 8080 -D
# start the scheduler
airflow scheduler
# visit localhost:8080 in the browser and enable the example dag in the home page
After the above steps , You can open this page in the browser :
Build here , You can see a basic appearance , But for now , I can find the following problems :
- airflow The metadata storage information of is used by default
sqlite
For storage . - The time in the upper right corner of the page is later than normal 8 Hour
- airflow The default is to schedule tasks with a single thread , Here's the picture ( Then I know )
Now the problem has been exposed , Then start processing .
airflow Metadata storage of the default is sqlite
, Now switch to mysql
sqlite
Multithreading not supported , So I plan to switch to mysql
,mysql
For installation, please refer to :Linux centos install mysqlairflow
There are many other options available in , So in airflow
Inside, we can choose some required components , For example, now we need to use mysql
Storage airflow
Information about , The list of optional functions is here airlfow Other options for
Need pip install 'apache-airflow[mysql]'
here , I won't choose , Come directly to the full set of plug-ins , This will encounter more pits :
yum install mysql-devel gcc gcc-devel python-devel krb5-devel.x86_64 cyrus-sasl-devel -y
pip install 'apache-airflow[all]' -i https://pypi.tuna.tsinghua.edu.cn/simple/
At this time, you may encounter many mistakes , For example, the following :
The mistakes in this should be handled with patience , There are usually errors in the version , Uninstall the version appearing in the scarlet letter , Just re specify the version number and install .
To configure airflow
- To configure
mysql
Asairflow
Storage of metadata information :vim /etc/my.cnf
stay[mysqld]
Add... Belowexplicit_defaults_for_timestamp=1
systemctl restart mysqld
restartmysql
- stay
mysql
Create a database , The name of my database isairflow
- Get into
airflow
Under folder , modifyairflow.cfg
.
- solve
airflow
The problem that only one task can be performed at a time :
Please refer to airflow practitioners - solve
airflow
The problem of being eight hours late
modifyairflow.cfg
The content of :
This part is for reference airflow Modify source code
The specific contents are as follows :# find airflow Installation position of find / -name airflow
- modify
/root/anaconda3/envs/airflow_env/lib/python3.7/site-packages/airflow/utils/timezone.py
In the 27 Add below line :
modify utcnow() function :from airflow import configuration as conf try: tz = conf.get("core", "default_timezone") if tz == "system": utc = pendulum.local_timezone() else: utc = pendulum.timezone(tz) except Exception: pass
- modify
/root/anaconda3/envs/airflow_env/lib/python3.7/site-packages/airflow/utils/sqlalchemy.py
In the 38 Add below line :utc = pendulum.timezone('UTC')
from airflow import configuration as conf
try:
tz = conf.get("core", "default_timezone")
if tz == "system":
utc = pendulum.local_timezone()
else:
utc = pendulum.timezone(tz)
except Exception:
pass
Comment this out :
3. modify /root/anaconda3/envs/airflow_env/lib/python3.7/site-packages/airflow/www/templates/admin/master.html
Modify the line indicated by the down arrow , The modified content is under the comment
At this time, the modification is completed , Restart at this time airflow
Just fine . Modify the operation reference of the source code :airflow Modify China time zone ( Change airflow Source code )
The final result is as follows :
Used mysql
As the storage of metadata information , stay mysql
You can see it in it :
Sign in airflow
When :
Solved the three problems mentioned before
summary
This airflow
As a scheduling tool , This time, I just mentioned how to install and build , When you are free in the future , I want to add an article on how to write airlfow
Task script for . The current level is only the writing of simple task scripts , More work is needed .
Last , If the boss reads it , Find out what's wrong with this document , Or what better advice , Leave a comment below . thank you !
边栏推荐
- Generating function (linear recursive relationship, generating function concept and formula derivation, violent calculation) 4000 word detailed analysis, with examples
- R语言ggplot2可视化:可视化散点图并为散点图中的数据点添加公式标签、使用ggrepel包的geom_text_repel函数避免数据点公式标签互相重叠(添加公式标签)
- [binary tree] verify binary tree
- Web3 couldn't escape the palm of these old giants
- R语言检验样本比例:使用prop.test函数执行单样本比例检验(检验成功样本所占的比例是否大于指定的比例值p)
- Dubai launches national meta universe strategy
- R language ggplot2 visualization: use the ggarrange function of ggpubr package to combine multiple images, and use the nrow parameter to specify the number of rows in the combined image
- Jmter -- database performance test
- CDH 6.1 environment construction graphic tutorial
- Module loader implementation of no.js
猜你喜欢
线程池7个参数的含义
“万物互联,使能千行百业”,2022 开放原子全球开源峰会 OpenAtom OpenHarmony 分论坛即将开幕
DS二叉树—二叉树结点的最大距离
服务器切换不同的conda环境以及查看各个用户进程
西门子博图安装期间反复重启的问题处理
生成函数(线性递推关系,生成函数概念与公式推导,暴力计算)四千字详细解析,附例题
Airflow详细搭建过程(亲测 + 总结)
Property dataSource is required 异常处理 [IDEA]
Percona XtraDB Cluster安装
Advanced architects, 16 common principles of microservice design and Governance
随机推荐
Anaconda安装jupyter lab + jupyterlsp(代码提示,代码纠错)详细搭建过程
数据分析与挖掘2
R语言使用mean函数计算样本(观测)数据中指定变量的相对频数:计算dataframe中指定数据列偏离平均值超过两个标准差的观测样本所占总体的比例
win上安装kibana
[solution] the solution of requesting Excel data to return PK results under unitywebrequest
在两个电子表格中找出相同的姓名
C中如何打开stdio.h ? 如何找到printf的定义?
数学建模之MATLAB画图汇总
同花顺开户安全么 中国十大证券公司排名
Information sharing | hc-05 Bluetooth module information
AG. DS二叉树--层次遍历
接口文档进化图鉴,有些古早接口文档工具,你可能都没用过
SSM项目完整源码[通俗易懂]
R语言检验样本比例:使用prop.test函数执行单样本比例检验(检验成功样本所占的比例是否大于指定的比例值p)
性能测试总体测试框架
R language uses the mean function to calculate the relative frequency of the specified variables in the sample (observation) data: calculate the proportion of the observation samples in the dataframe
Next.js 与 Remix
生产环境TiDB集群缩容TiKV操作步骤
WebSockets 和 Server-Sent Events
AttributeError:module ‘distutils’ has no attribute ‘version错误解决方法