Fa'afefea ona fai se DAG fa'aoso i le Airflow e fa'aaoga ai le API Fa'ata'ita'i

I le sauniaina oa tatou polokalame faaleaoaoga, tatou te feagai i lea taimi ma lea taimi ma faigata i tulaga o le galulue faatasi ma nisi meafaigaluega. Ma i le taimi tatou te fetaiaʻi ai, e le lava i taimi uma faʻamaumauga ma tala e fesoasoani e faʻafetaui ai lenei faʻafitauli.

O le mea lea, mo se faʻataʻitaʻiga, i le 2015, ma sa matou faʻaogaina le Hadoop cluster ma Spark mo le 35 faʻaoga tutusa i le Big Data Specialist program. E le'o manino pe fa'apefea ona saunia mo se fa'aoga fa'aoga fa'aaoga YARN. O se taunuuga, ina ua latou iloa ma savavali i latou lava i le ala, sa latou faia lafo i luga o Habré ma faia foi Moscow Spark Meetup.

prehistory

O le taimi lenei o le a tatou talanoa e uiga i se polokalame ese - Inisinia Faʻamatalaga. I luga, o matou tagata auai e fausia ni ituaiga se lua o fausaga: lambda ma kappa. Ma i le fausaga o le lamdba, Airflow o loʻo faʻaaogaina e avea o se vaega o le faʻagasologa o vaega e faʻafeiloaʻi ai ogalaau mai le HDFS i le ClickHouse.

Все в общем-то хорошо. Пусть строят свои пайплайны. Однако, есть «но»: все наши программы технологичны с точки зрения самого процесса обучения. Для проверки лаб мы используем автоматические чекеры: участнику нужно зайти в личный кабинет, нажать кнопку “Проверить”, и через какое-то время он видит какую-то расширенную обратную связь на то, что сделал. И именно в этот момент мы начинаем подходить к нашей проблеме.

O le siakiina o lenei fale su'esu'e o lo'o fa'atulagaina e pei ona taua i lalo: matou te tu'uina atu se pusa fa'amatalaga fa'atonu i le Kafka a le sui auai, ona fa'aliliuina lea e Gobblin lenei pepa fa'amatalaga i le HDFS, ona ave lea e le Airflow le pepa fa'amaumauga ma tu'u i totonu o le ClickHouse. O le togafiti o le Airflow e le tatau ona faia lenei mea i le taimi moni, e faia i le taimi faʻatulagaina: tasi i le 15 minute e manaʻomia ai le tele o faila ma tuʻuina atu.

E foliga mai e manaʻomia ona tatou faʻaosoina a latou DAG i la tatou talosaga aʻo tamoʻe le siaki iinei ma le taimi nei. Googling, na matou iloa ai mo lomiga mulimuli o Airflow o loʻo i ai se mea e taʻua API fa'ata'ita'i. Le upu experimental, ioe, e foliga taufaafefe, ae o le a le mea e fai ... E faafuasei lava ona alu ese.

Le isi, o le a matou faʻamatalaina le ala atoa: mai le faʻapipiʻiina o le Airflow i le faʻatupuina o se POST talosaga e faʻaosoina ai se DAG e faʻaaoga ai le API Faʻataʻitaʻi. Matou te galulue ma le Ubuntu 16.04.

1. Faʻapipiʻiina o le ea

Sei o tatou siaki o loʻo i ai le Python 3 ma virtualenv.

$ python3 --version
Python 3.6.6
$ virtualenv --version
15.2.0

Afai e misi se tasi o nei mea, ona faʻapipiʻi lea.

Sei o tatou faia se lisi o le a tatou faʻaauau pea ona galulue ma Airflow.

$ mkdir <your name of directory>
$ cd /path/to/your/new/directory
$ virtualenv -p which python3 venv
$ source venv/bin/activate
(venv) $

Fa'apipi'i le Airflow:

(venv) $ pip install airflow

Faiga sa matou galulue ai: 1.10.

Ole taimi nei e mana'omia ona tatou faia se lisi airflow_home, lea o le a iai faila DAG ma Airflow plugins. A maeʻa ona faia le faʻatonuga, seti le fesuiaiga o le siosiomaga AIRFLOW_HOME.

(venv) $ cd /path/to/my/airflow/workspace
(venv) $ mkdir airflow_home
(venv) $ export AIRFLOW_HOME=<path to airflow_home>

O le isi laasaga o le faʻatautaia lea o le faʻatonuga o le a fatuina ma amataina le database dataflow i SQLite:

(venv) $ airflow initdb

O le a fausia le database i airflow.db lē mafai

Siaki pe fa'apipi'i le Airflow:

$ airflow version
[2018-11-26 19:38:19,607] {__init__.py:57} INFO - Using executor SequentialExecutor
[2018-11-26 19:38:19,745] {driver.py:123} INFO - Generating grammar tables from /usr/lib/python3.6/lib2to3/Grammar.txt
[2018-11-26 19:38:19,771] {driver.py:123} INFO - Generating grammar tables from /usr/lib/python3.6/lib2to3/PatternGrammar.txt
  ____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ _ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  ____/____/|__/
   v1.10.0

Afai na galue le poloaiga, ona faia lea e Airflow lana lava faila faʻatulagaina airflow.cfg в AIRFLOW_HOME:

$ tree
.
├── airflow.cfg
└── unittests.cfg

O le Airflow o loʻo i ai se faʻaoga i luga ole laiga. E mafai ona faʻalauiloa e ala i le faʻatinoina o le poloaiga:

(venv) $ airflow webserver --port 8081

E mafai nei ona e mauaina le upega tafaʻilagi i totonu o se suʻesuʻega i luga o le taulaga 8081 i luga o le talimalo o loʻo tamoe ai le Airflow, pei o lenei: <hostname:8081>.

2. Galulue ma le API Fa'ata'ita'i

I luga o lenei Airflow ua configured ma sauni e alu. Ae ui i lea, e manaʻomia foʻi ona faʻatautaia le API Faʻataʻitaʻi. O a matou siaki e tusia i le Python, o lea e sili atu talosaga uma o le ai ai i luga ole faletusi requests.

O le mea moni o loʻo galue le API mo talosaga faigofie. Mo se faʻataʻitaʻiga, o sea talosaga e mafai ai ona e suʻeina lana galuega:

>>> import requests
>>> host = <your hostname>
>>> airflow_port = 8081 #в нашем случае такой, а по дефолту 8080
>>> requests.get('http://{}:{}/{}'.format(host, airflow_port, 'api/experimental/test').text
'OK'

Afai na e mauaina sea feʻau i le tali, o lona uiga o loʻo galue mea uma.

Ae peitaʻi, pe a matou mananaʻo e faʻaosoina se DAG, matou te taufetuli i le mea moni e le mafai ona faia lenei ituaiga talosaga e aunoa ma le faʻamaonia.

Ina ia faia lenei mea, e tatau ona e faia ni nai gaioiga.

Muamua, e tatau ona e faʻaopopo i le config:

[api]
auth_backend = airflow.contrib.auth.backends.password_auth

Ona, e manaʻomia le fatuina o lau tagata faʻaoga ma aia tatau faʻafoe:

>>> import airflow
>>> from airflow import models, settings
>>> from airflow.contrib.auth.backends.password_auth import PasswordUser
>>> user = PasswordUser(models.Admin())
>>> user.username = 'new_user_name'
>>> user.password = 'set_the_password'
>>> session = settings.Session()
>>> session.add(user)
>>> session.commit()
>>> session.close()
>>> exit()

Le isi, e tatau ona e fatuina se tagata faʻaoga ma aia tatau masani o le a faʻatagaina e fai se DAG faʻaoso.

>>> import airflow
>>> from airflow import models, settings
>>> from airflow.contrib.auth.backends.password_auth import PasswordUser
>>> user = PasswordUser(models.User())
>>> user.username = 'newprolab'
>>> user.password = 'Newprolab2019!'
>>> session = settings.Session()
>>> session.add(user)
>>> session.commit()
>>> session.close()
>>> exit()

O lea ua sauni mea uma.

3. Fa'alauiloa se talosaga POST

O le talosaga a le POST lava ia o le a pei o lenei:

>>> dag_id = newprolab
>>> url = 'http://{}:{}/{}/{}/{}'.format(host, airflow_port, 'api/experimental/dags', dag_id, 'dag_runs')
>>> data = {"conf":"{"key":"value"}"}
>>> headers = {'Content-type': 'application/json'}
>>> auth = ('newprolab', 'Newprolab2019!')
>>> uri = requests.post(url, data=json.dumps(data), headers=headers, auth=auth)
>>> uri.text
'{n  "message": "Created <DagRun newprolab @ 2019-03-27 10:24:25+00:00: manual__2019-03-27T10:24:25+00:00, externally triggered: True>"n}n'

Manuia le faiga ole talosaga.

E tusa ai, ona matou tuʻuina atu lea i le DAG se taimi e faʻagasolo ai ma fai se talosaga i le laulau ClickHouse, taumafai e puʻe le pusa faʻamatalaga faʻatonutonu.

Ua mae'a le fa'amaoniga.

puna: www.habr.com

Faaopoopo i ai se faamatalaga