I le sauniaina oa tatou polokalame faaleaoaoga, tatou te feagai i lea taimi ma lea taimi ma faigata i tulaga o le galulue faatasi ma nisi meafaigaluega. Ma i le taimi tatou te fetaiaʻi ai, e le lava i taimi uma faʻamaumauga ma tala e fesoasoani e faʻafetaui ai lenei faʻafitauli.
O le mea lea, mo se faʻataʻitaʻiga, i le 2015, ma sa matou faʻaogaina le Hadoop cluster ma Spark mo le 35 faʻaoga tutusa i le Big Data Specialist program. E le'o manino pe fa'apefea ona saunia mo se fa'aoga fa'aoga fa'aaoga YARN. O se taunuuga, ina ua latou iloa ma savavali i latou lava i le ala, sa latou faia
prehistory
O le taimi lenei o le a tatou talanoa e uiga i se polokalame ese -
Все в общем-то хорошо. Пусть строят свои пайплайны. Однако, есть «но»: все наши программы технологичны с точки зрения самого процесса обучения. Для проверки лаб мы используем автоматические чекеры: участнику нужно зайти в личный кабинет, нажать кнопку “Проверить”, и через какое-то время он видит какую-то расширенную обратную связь на то, что сделал. И именно в этот момент мы начинаем подходить к нашей проблеме.
O le siakiina o lenei fale su'esu'e o lo'o fa'atulagaina e pei ona taua i lalo: matou te tu'uina atu se pusa fa'amatalaga fa'atonu i le Kafka a le sui auai, ona fa'aliliuina lea e Gobblin lenei pepa fa'amatalaga i le HDFS, ona ave lea e le Airflow le pepa fa'amaumauga ma tu'u i totonu o le ClickHouse. O le togafiti o le Airflow e le tatau ona faia lenei mea i le taimi moni, e faia i le taimi faʻatulagaina: tasi i le 15 minute e manaʻomia ai le tele o faila ma tuʻuina atu.
E foliga mai e manaʻomia ona tatou faʻaosoina a latou DAG i la tatou talosaga aʻo tamoʻe le siaki iinei ma le taimi nei. Googling, na matou iloa ai mo lomiga mulimuli o Airflow o loʻo i ai se mea e taʻua experimental
, ioe, e foliga taufaafefe, ae o le a le mea e fai ... E faafuasei lava ona alu ese.
Le isi, o le a matou faʻamatalaina le ala atoa: mai le faʻapipiʻiina o le Airflow i le faʻatupuina o se POST talosaga e faʻaosoina ai se DAG e faʻaaoga ai le API Faʻataʻitaʻi. Matou te galulue ma le Ubuntu 16.04.
1. Faʻapipiʻiina o le ea
Sei o tatou siaki o loʻo i ai le Python 3 ma virtualenv.
$ python3 --version
Python 3.6.6
$ virtualenv --version
15.2.0
Afai e misi se tasi o nei mea, ona faʻapipiʻi lea.
Sei o tatou faia se lisi o le a tatou faʻaauau pea ona galulue ma Airflow.
$ mkdir <your name of directory>
$ cd /path/to/your/new/directory
$ virtualenv -p which python3 venv
$ source venv/bin/activate
(venv) $
Fa'apipi'i le Airflow:
(venv) $ pip install airflow
Faiga sa matou galulue ai: 1.10.
Ole taimi nei e mana'omia ona tatou faia se lisi airflow_home
, lea o le a iai faila DAG ma Airflow plugins. A maeʻa ona faia le faʻatonuga, seti le fesuiaiga o le siosiomaga AIRFLOW_HOME
.
(venv) $ cd /path/to/my/airflow/workspace
(venv) $ mkdir airflow_home
(venv) $ export AIRFLOW_HOME=<path to airflow_home>
O le isi laasaga o le faʻatautaia lea o le faʻatonuga o le a fatuina ma amataina le database dataflow i SQLite:
(venv) $ airflow initdb
O le a fausia le database i airflow.db
lē mafai
Siaki pe fa'apipi'i le Airflow:
$ airflow version
[2018-11-26 19:38:19,607] {__init__.py:57} INFO - Using executor SequentialExecutor
[2018-11-26 19:38:19,745] {driver.py:123} INFO - Generating grammar tables from /usr/lib/python3.6/lib2to3/Grammar.txt
[2018-11-26 19:38:19,771] {driver.py:123} INFO - Generating grammar tables from /usr/lib/python3.6/lib2to3/PatternGrammar.txt
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ _ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ ____/____/|__/
v1.10.0
Afai na galue le poloaiga, ona faia lea e Airflow lana lava faila faʻatulagaina airflow.cfg
в AIRFLOW_HOME
:
$ tree
.
├── airflow.cfg
└── unittests.cfg
O le Airflow o loʻo i ai se faʻaoga i luga ole laiga. E mafai ona faʻalauiloa e ala i le faʻatinoina o le poloaiga:
(venv) $ airflow webserver --port 8081
E mafai nei ona e mauaina le upega tafaʻilagi i totonu o se suʻesuʻega i luga o le taulaga 8081 i luga o le talimalo o loʻo tamoe ai le Airflow, pei o lenei: <hostname:8081>
.
2. Galulue ma le API Fa'ata'ita'i
I luga o lenei Airflow ua configured ma sauni e alu. Ae ui i lea, e manaʻomia foʻi ona faʻatautaia le API Faʻataʻitaʻi. O a matou siaki e tusia i le Python, o lea e sili atu talosaga uma o le ai ai i luga ole faletusi requests
.
O le mea moni o loʻo galue le API mo talosaga faigofie. Mo se faʻataʻitaʻiga, o sea talosaga e mafai ai ona e suʻeina lana galuega:
>>> import requests
>>> host = <your hostname>
>>> airflow_port = 8081 #в нашем случае такой, а по дефолту 8080
>>> requests.get('http://{}:{}/{}'.format(host, airflow_port, 'api/experimental/test').text
'OK'
Afai na e mauaina sea feʻau i le tali, o lona uiga o loʻo galue mea uma.
Ae peitaʻi, pe a matou mananaʻo e faʻaosoina se DAG, matou te taufetuli i le mea moni e le mafai ona faia lenei ituaiga talosaga e aunoa ma le faʻamaonia.
Ina ia faia lenei mea, e tatau ona e faia ni nai gaioiga.
Muamua, e tatau ona e faʻaopopo i le config:
[api]
auth_backend = airflow.contrib.auth.backends.password_auth
Ona, e manaʻomia le fatuina o lau tagata faʻaoga ma aia tatau faʻafoe:
>>> import airflow
>>> from airflow import models, settings
>>> from airflow.contrib.auth.backends.password_auth import PasswordUser
>>> user = PasswordUser(models.Admin())
>>> user.username = 'new_user_name'
>>> user.password = 'set_the_password'
>>> session = settings.Session()
>>> session.add(user)
>>> session.commit()
>>> session.close()
>>> exit()
Le isi, e tatau ona e fatuina se tagata faʻaoga ma aia tatau masani o le a faʻatagaina e fai se DAG faʻaoso.
>>> import airflow
>>> from airflow import models, settings
>>> from airflow.contrib.auth.backends.password_auth import PasswordUser
>>> user = PasswordUser(models.User())
>>> user.username = 'newprolab'
>>> user.password = 'Newprolab2019!'
>>> session = settings.Session()
>>> session.add(user)
>>> session.commit()
>>> session.close()
>>> exit()
O lea ua sauni mea uma.
3. Fa'alauiloa se talosaga POST
O le talosaga a le POST lava ia o le a pei o lenei:
>>> dag_id = newprolab
>>> url = 'http://{}:{}/{}/{}/{}'.format(host, airflow_port, 'api/experimental/dags', dag_id, 'dag_runs')
>>> data = {"conf":"{"key":"value"}"}
>>> headers = {'Content-type': 'application/json'}
>>> auth = ('newprolab', 'Newprolab2019!')
>>> uri = requests.post(url, data=json.dumps(data), headers=headers, auth=auth)
>>> uri.text
'{n "message": "Created <DagRun newprolab @ 2019-03-27 10:24:25+00:00: manual__2019-03-27T10:24:25+00:00, externally triggered: True>"n}n'
Manuia le faiga ole talosaga.
E tusa ai, ona matou tuʻuina atu lea i le DAG se taimi e faʻagasolo ai ma fai se talosaga i le laulau ClickHouse, taumafai e puʻe le pusa faʻamatalaga faʻatonutonu.
Ua mae'a le fa'amaoniga.
puna: www.habr.com