Me pehea te hanga keu DAG i roto i te Rererangi ma te whakamahi i te API Whakamatau

I te whakareri i a maatau kaupapa ako, ka raru tatou i ia wa mo te mahi me etahi taputapu. A i tenei wa ka tutaki tatou ki a raatau, kaore i te nui nga tuhinga me nga tuhinga hei awhina i tenei raru.

Na, hei tauira, i te tau 2015, ka whakamahia e matou te roopu Hadoop me Spark mo nga kaiwhakamahi 35 i te wa kotahi i runga i te kaupapa Tohunga Raraunga Nui. Kaore i tino marama me pehea te whakarite mo taua keehi kaiwhakamahi ma te whakamahi i te YARN. Ko te hua o tenei, na ratou i mohio me te haere i te huarahi i a raatau ano, i mahi panui i runga i a Habré me te mahi ano Huihuinga Spark Moscow.

prehistory

I tenei wa ka korero tatou mo tetahi kaupapa rereke - Kaituhi Raraunga. I runga, ka hangaia e o maatau kaiuru nga momo hoahoanga e rua: lambda me te kappa. A, i roto i te hoahoanga o te lamdba, ka whakamahia te Airflow hei waahanga o te tukatuka puranga ki te whakawhiti i nga rakau mai i HDFS ki ClickHouse.

He pai nga mea katoa. Ma ratou e hanga a ratou paipa. Heoi ano, he "engari": ko o tatou kaupapa katoa he mea hangarau i runga i nga tikanga o te tukanga ako. Hei tirotiro i te taiwhanga, ka whakamahi matou i nga kaitaki aunoa: me haere te kaiuru ki tana putea whaiaro, pawhiria te paatene "Tirohia", a, i muri i etahi wa ka kite ia i etahi momo urupare roa mo ana mahi. Na i tenei wa ka timata tatou ki te whakatata ki to tatou raruraru.

Ko te tirotiro i tenei taiwhanga kua whakaritea e whai ake nei: ka tukuna e matou he putea raraunga mana ki te Kafka o te kaiuru, katahi ka tukuna e Gobblin tenei kete raraunga ki te HDFS, katahi ka tango a Airflow i tenei kete raraunga ka tuu ki ClickHouse. Ko te mahi ko te Airflow e kore e tika ki te mahi i tenei i roto i te waa tuuturu, ka mahia i runga i te waarangi: kotahi ia 15 meneti ka tangohia he paihere o nga konae ka tukuna.

Te ahua nei me whakakorikori tatou i a raatau DAG i runga i ta maatau tono i te wa e rere ana te kaitaki i konei a inaianei. Googling, i kitea e maatau mo nga putanga o muri mai o Airflow kei reira tetahi mea e kiia ana API Whakamatau. Te kupu experimental, o te akoranga, te tangi ri'ari'a, engari aha ki te mahi ... Ka rere ohorere.

I muri mai, ka whakaahuahia e matou te huarahi katoa: mai i te whakauru Airflow ki te whakaputa i te tono POST e whakaoho ana i te DAG ma te whakamahi i te API Whakamatau. Ka mahi tahi matou me te Ubuntu 16.04.

1. Tautanga Rererangi

Me titiro kei a tatou te Python 3 me te virtualenv.

$ python3 --version
Python 3.6.6
$ virtualenv --version
15.2.0

Mena kei te ngaro tetahi o enei, whakauruhia.

Inaianei me hanga he raarangi ka mahi tonu tatou me te Airflow.

$ mkdir <your name of directory>
$ cd /path/to/your/new/directory
$ virtualenv -p which python3 venv
$ source venv/bin/activate
(venv) $

Tāuta Rererangi:

(venv) $ pip install airflow

Putanga i mahihia e matou: 1.10.

Inaianei me hanga he whaiaronga airflow_home, kei reira nga konae DAG me nga monomai Airflow. I muri i te waihanga i te whaiaronga, tautuhia te taurangi taiao AIRFLOW_HOME.

(venv) $ cd /path/to/my/airflow/workspace
(venv) $ mkdir airflow_home
(venv) $ export AIRFLOW_HOME=<path to airflow_home>

Ko te mahi e whai ake nei ko te whakahaere i te whakahau ka waihanga me te arawhiti i te papaunga raraunga rerenga i SQLite:

(venv) $ airflow initdb

Ka hangahia te patengi raraunga ki roto airflow.db taunoa.

Tirohia mehemea kua whakauruhia te Rererangi:

$ airflow version
[2018-11-26 19:38:19,607] {__init__.py:57} INFO - Using executor SequentialExecutor
[2018-11-26 19:38:19,745] {driver.py:123} INFO - Generating grammar tables from /usr/lib/python3.6/lib2to3/Grammar.txt
[2018-11-26 19:38:19,771] {driver.py:123} INFO - Generating grammar tables from /usr/lib/python3.6/lib2to3/PatternGrammar.txt
  ____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ _ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  ____/____/|__/
   v1.10.0

Mena i mahi te whakahau, na Airflow i hanga tana ake konae whirihoranga airflow.cfg в AIRFLOW_HOME:

$ tree
.
├── airflow.cfg
└── unittests.cfg

He atanga tukutuku to Airflow. Ka taea te whakarewa ma te whakahaere i te whakahau:

(venv) $ airflow webserver --port 8081

Ka taea e koe te uru atu ki te atanga tukutuku i roto i te kaitirotiro i runga i te tauranga 8081 i runga i te kaihautu kei te rere a Airflow, penei: <hostname:8081>.

2. Te mahi me te API Whakamatau

I runga i tenei Airflow kua whirihorahia, kua reri ki te haere. Engari, me whakahaere ano te API Whakamatau. Kua tuhia a matou kaitaki ki te reo Python, no reira ka tono katoa ma te whakamahi i te whare pukapuka requests.

Inaa kei te mahi tonu te API mo nga tono ngawari. Hei tauira, ma tenei tono ka taea e koe te whakamatautau i ana mahi:

>>> import requests
>>> host = <your hostname>
>>> airflow_port = 8081 #в нашем случае такой, а по дефолту 8080
>>> requests.get('http://{}:{}/{}'.format(host, airflow_port, 'api/experimental/test').text
'OK'

Mena kua tae mai he panui penei i a koe hei whakautu, ko te tikanga kei te mahi nga mea katoa.

Heoi, ina hiahia ana matou ki te whakaoho i te DAG, ka uru matou ki te meka e kore e taea tenei momo tono me te kore e whakamanahia.

Ki te mahi i tenei, me mahi koe i nga mahi maha.

Tuatahi, me whakauru e koe tenei ki te whirihora:

[api]
auth_backend = airflow.contrib.auth.backends.password_auth

Na, me hanga e koe to kaiwhakamahi me nga mana whakahaere:

>>> import airflow
>>> from airflow import models, settings
>>> from airflow.contrib.auth.backends.password_auth import PasswordUser
>>> user = PasswordUser(models.Admin())
>>> user.username = 'new_user_name'
>>> user.password = 'set_the_password'
>>> session = settings.Session()
>>> session.add(user)
>>> session.commit()
>>> session.close()
>>> exit()

I muri mai, me hanga e koe he kaiwhakamahi me nga mana tika ka whakaaetia ki te hanga DAG keu.

>>> import airflow
>>> from airflow import models, settings
>>> from airflow.contrib.auth.backends.password_auth import PasswordUser
>>> user = PasswordUser(models.User())
>>> user.username = 'newprolab'
>>> user.password = 'Newprolab2019!'
>>> session = settings.Session()
>>> session.add(user)
>>> session.commit()
>>> session.close()
>>> exit()

Inaianei kua rite nga mea katoa.

3. Te whakarewa i te tono POST

Ko te tono POST ake ka penei te ahua:

>>> dag_id = newprolab
>>> url = 'http://{}:{}/{}/{}/{}'.format(host, airflow_port, 'api/experimental/dags', dag_id, 'dag_runs')
>>> data = {"conf":"{"key":"value"}"}
>>> headers = {'Content-type': 'application/json'}
>>> auth = ('newprolab', 'Newprolab2019!')
>>> uri = requests.post(url, data=json.dumps(data), headers=headers, auth=auth)
>>> uri.text
'{n  "message": "Created <DagRun newprolab @ 2019-03-27 10:24:25+00:00: manual__2019-03-27T10:24:25+00:00, externally triggered: True>"n}n'

I tutuki pai te tono.

Na reira, ka hoatu e matou he wa ki te DAG ki te tukatuka me te tono ki te ripanga ClickHouse, e ngana ana ki te hopu i te paatete raraunga mana.

Kua oti te manatoko.

Source: will.com

Tāpiri i te kōrero