αααα»αααΆααααα ααααααα·ααΈα’ααααααααααΎα ααΎααα½ααααααααΆαααααΆαααΆαααα αΆαααΆααααααΉαααΆαααααΎααΆαααΆαα½αα§ααααααα½αα ααα½αα α αΎααα αααααααα ααααααααΎααα½αααααααα½ααα ααΆαα·ααααααααΆαα―αααΆα αα·αα’ααααααααααααααΆαααααα’αΆα αα½ααααααααΆααααα αΆαααααΆαααα
ααΌα
ααααααΆααΆα§ααΆα ααααααα»αααααΆα 2015 α αΎαααΎαααΆαααααΎα
ααααα Hadoop ααΆαα½α Spark αααααΆααα’αααααααΎααααΆαα 35 ααΆαααααα»ααααααααΆαααααΆαα
ααΎαααααα·ααΈ Big Data Specialist α ααΆαα·αα
αααΆααα’αααΈαααααααα
αααΆαααααΆααααααΈα’αααααααΎααααΆαααααααααααααααΎ YARN ααα ααΆαααααα αααααΆααααααααα αα·αααΎαααΎααααΌαααααααα½αα―α αα½αααααΆαααααΎ
αααααα»αααααααααα·ααΆααααα
ααΎααααααΎαααΉααα·ααΆαα’αααΈαααααα·ααΈααααα -
ααΆααΌαα α’αααΈαααΊααα’α α’αα»ααααΆαα±αααα½αααααΆααααααααααααααα½αααα ααααααΆαααΆααααα ααΆα "ααα»αααα"α αααααα·ααΈααααααΎαααΆααα’ααααΊααΆαααΆαααΏαααΏαααΆααα αα αααα·ααααΆααΆααααααΉαααααΎαααΆααα·ααααΆααααααα½αα―αα ααΎααααΈαα·αα·αααααΎααααααΈααα·ααααα ααΎαααααΎα§ααααααα·αα·ααααααααααααααααααα·α α’αααα αΌααα½αααααΌαα αΌααα ααΆααααααΈααααΆαααααα½αααααααΆαα α α»α αααΌαα»α "αα·αα·ααα" α αΎααα½αααααα»ααααααααααΆααααΎαααααααααααα·αααααα’αα½αα ααα½αααΎα’αααΈαααααΆααααΆαααααΎα α αΎαααΆααΊαα α ααα»α ααααααααΎαα αΆααααααΎααα·ααα αααααα αΆααααααΎαα
ααΆααα·αα·αααααΎααααααΈααα·ααααααααααααΌαααΆααααα αααΌα ααΆααααααα ααΎαααααΎαααα αααα·αααααααααα½ααα·αα·ααααα Kafka ααααα’αααα αΌααα½α αααααΆαααα Gobblin ααααααααα αααα·ααααααααααα HDFS αααααΆαααα Airflow αααααα αααα·ααααααααα α αΎαααΆααααΆαα αααα»α ClickHouse α αααα·α ααΊααΆ Airflow αα·αα αΆαααΆα αααααΎαααααααααα»ααααααααΆααΆαααααααααααα ααΆααααΎααΆααΆαααΆααα·ααΆαα αααααΆαα 15 ααΆααΈαααα ααΆααααΌαααΆαα―αααΆαααΆα αααΎα α αΎααααα ααααΆα
ααΆααααααΆααΎαα
αΆαααΆα
αααααΌαα
αΆααααααΎα DAG αααααα½αααααααααα½αα―αααΆαααααΎααααααΎααααααααααα’ααααααα½ααα·αα·ααααααα»αααααΎαααΆααα
ααΈααααα·αα₯α‘αΌααααα Googling ααΎαααΆαααααΎαααΆαααααΆααααααααααααα Airflow ααΊααΆα’αααΈαααααα α
ααΆ experimental
αα·αααΆαα ααΆααααΆαααα
αα½αα±ααααααΆα
ααα»ααααα’αααΈαααααααΌαααααΎ ... ααΆααααΆαααααααααα
αα·αα
αααααΆαα ααΎαααΉααααααΆααα’αααΈααααΌαααΆααααΌαα ααΈααΆαααα‘αΎα Airflow αααααΆααααααΎαααααΎ POST αααααααα±ααααΆα DAG αααααααΎ API αα·αααααα ααΎαααΉαααααΎααΆαααΆαα½α Ubuntu 16.04 α
1. ααΆαααα‘αΎαααα αΌαααααα
ααΌααα·αα·αααααΎαααΆααΎαααΆα Python 3 αα·α virtualenv α
$ python3 --version
Python 3.6.6
$ virtualenv --version
15.2.0
ααααα·αααΎαα½ααααα»αα ααααααΆαααααααΆαα, αααααΆααααααα‘αΎαααΆα
α₯α‘αΌαααα α αΌαααΎααααααΎααααααααΎαααΉαααααααααΎααΆαααΆαα½α Airflow α
$ mkdir <your name of directory>
$ cd /path/to/your/new/directory
$ virtualenv -p which python3 venv
$ source venv/bin/activate
(venv) $
ααα‘αΎαααα αΌααααααα
(venv) $ pip install airflow
αααααααααΎαααΆαααααΎααΆαααΎα 1.10.
α₯α‘αΌαααΎαααααΌααααααΎαααα―αααΆα airflow_home
αααααΆαααααααααα―αααΆα DAG αα·ααααααα·ααΈαααα½α Airflow ααΉαααΆαααΈααΆαααα
α αααααΆααααΈαααααΎααααααααα’αααααα·ααααΆα AIRFLOW_HOME
.
(venv) $ cd /path/to/my/airflow/workspace
(venv) $ mkdir airflow_home
(venv) $ export AIRFLOW_HOME=<path to airflow_home>
ααα αΆααααααΆααααΊααααΌαααααΎαααΆαααΆααααααααΆαααααΉααααααΎα αα·αα αΆααααααΎαααΌαααααΆααα·ααααααααα αΌααα·αααααααααα»α SQLiteα
(venv) $ airflow initdb
ααΌαααααΆααα·ααααααααΉαααααΌαααΆααααααΎααα
αααα»α airflow.db
ααααΆαααΎα
αα·αα·αααααΎαααΆααΎ Airflow ααααΌαααΆαααα‘αΎαα¬α’ααα
$ airflow version
[2018-11-26 19:38:19,607] {__init__.py:57} INFO - Using executor SequentialExecutor
[2018-11-26 19:38:19,745] {driver.py:123} INFO - Generating grammar tables from /usr/lib/python3.6/lib2to3/Grammar.txt
[2018-11-26 19:38:19,771] {driver.py:123} INFO - Generating grammar tables from /usr/lib/python3.6/lib2to3/PatternGrammar.txt
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ _ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ ____/____/|__/
v1.10.0
ααααα·αααΎααΆααααααααΆααααΎαααΆαααα Airflow ααΆααααααΎαα―αααΆαααααααα
ααΆααααααααααααΆαααααα½αααααααΆα airflow.cfg
Π² AIRFLOW_HOME
:
$ tree
.
βββ airflow.cfg
βββ unittests.cfg
Airflow ααΆαα ααα»α αααααΆαααααααΆαα ααΆα’αΆα ααααΌαααΆαααΎαααααΎαααΆααααααααΎαααΆαααΆααααααααΆ:
(venv) $ airflow webserver --port 8081
α₯α‘αΌααααα’αααα’αΆα
α
αΌαααααΎα
ααα»α
αααααΆαααααααΆααα
αααα»ααααααα·ααΈαα»αααααΆαα
ααα 8081 αα
ααΎαααΆαααΈαααα Airflow αααα»αααααΎαααΆαααΌα
αααα <hostname:8081>
.
2. ααααΎααΆαααΆαα½α API αα·ααααα
αα
ααΎ Airflow αααααααΌαααΆαααααααα
ααΆαααααααα αα·αααααααα½α
ααΆααα ααααααΆαααΆααααα ααΎαααααααΌαααααΎαααΆα API ααΆααααααααααα α§ααααααα·αα·αααααααααΎαααααΌαααΆαααααααα
αααα»α Python ααΌα
ααααααααΎααΆααα’ααααΉαααΆααα
ααΎααΆαααααααΎαααααΆααα requests
.
ααΆααα·α API αααα»αααααΎαααΆααααααΆααααααΎααΆαααααα½α α αΎαα α§ααΆα ααα ααααΎααααααα’αα»ααααΆαα±ααα’αααααΆαααααααΆαααΆαααααααΆα
>>> import requests
>>> host = <your hostname>
>>> airflow_port = 8081 #Π² Π½Π°ΡΠ΅ΠΌ ΡΠ»ΡΡΠ°Π΅ ΡΠ°ΠΊΠΎΠΉ, Π° ΠΏΠΎ Π΄Π΅ΡΠΎΠ»ΡΡ 8080
>>> requests.get('http://{}:{}/{}'.format(host, airflow_port, 'api/experimental/test').text
'OK'
ααααα·αααΎα’αααααΆαααα½αααΆαααααααααΆααΆαααααΎααα ααΆααΆααααααΆα’αααΈαααααΎαααΆαα αΎαα
ααααααΆαααΆααααα αα ααααααααΎαα ααααα DAG ααα ααΎαααααΎαααΆααα ααΆααα·ααααααΆααααΎααααααααααα·αα’αΆα ααααΌαααΆαααααΎα‘αΎααααααααΆαααΆααααααααααΆαααααααα
ααΎααααΈααααΎααΌα αααα’αααααΉαααααΌαααααΎαααααααΆααα½αα ααα½αα
ααααΌαα’αααααααΌαααααααααΆαα configα
[api]
auth_backend = airflow.contrib.auth.backends.password_auth
αααααΆαααα α’αααααααΌααααααΎαα’αααααααΎααααΆααααααα’αααααΆαα½αααΉααα·αααα·αααααααααα
>>> import airflow
>>> from airflow import models, settings
>>> from airflow.contrib.auth.backends.password_auth import PasswordUser
>>> user = PasswordUser(models.Admin())
>>> user.username = 'new_user_name'
>>> user.password = 'set_the_password'
>>> session = settings.Session()
>>> session.add(user)
>>> session.commit()
>>> session.close()
>>> exit()
αααααΆαααα α’αααααααΌααααααΎαα’αααααααΎααααΆαααααααΆααα·αααα·ααααααΆ αααααΉαααααΌαααΆαα’αα»ααααΆαα±αααααααΎα DAG trigger α
>>> import airflow
>>> from airflow import models, settings
>>> from airflow.contrib.auth.backends.password_auth import PasswordUser
>>> user = PasswordUser(models.User())
>>> user.username = 'newprolab'
>>> user.password = 'Newprolab2019!'
>>> session = settings.Session()
>>> session.add(user)
>>> session.commit()
>>> session.close()
>>> exit()
α₯α‘αΌααααα’αααΈααααααααΆαααΊαα½α ααΆααα
3. α αΆααααααΎαααααΎ POST
ααααΎαααα POST αααα½αααΆααΉαααΎααα ααΌα αααα
>>> dag_id = newprolab
>>> url = 'http://{}:{}/{}/{}/{}'.format(host, airflow_port, 'api/experimental/dags', dag_id, 'dag_runs')
>>> data = {"conf":"{"key":"value"}"}
>>> headers = {'Content-type': 'application/json'}
>>> auth = ('newprolab', 'Newprolab2019!')
>>> uri = requests.post(url, data=json.dumps(data), headers=headers, auth=auth)
>>> uri.text
'{n "message": "Created <DagRun newprolab @ 2019-03-27 10:24:25+00:00: manual__2019-03-27T10:24:25+00:00, externally triggered: True>"n}n'
ααααΎααααΌαααΆαααααΎαααΆααααααααααα
ααΌα ααααα αΎα ααΎαααααααααααααΆααααααα DAG ααΎααααΈααααΎαααΆα αα·αααααΎααΆαααααΎαα»ααα ααΆααααΆααΆα ClickHouse αααααααΆααΆαα αΆαααααααα αααα·αααααααααααααααα
ααΆααααααααααΆααααΆααααα ααα
ααααα: www.habr.com