Preamble: by the will of fate, from the world of academic science (medicine), I ended up in the world of information technology, where I have to use my knowledge of the methodology for constructing an experiment and strategies for analyzing experimental data, however, to apply a new technology stack for me. In the process of mastering these technologies, I encounter a number of difficulties, which, fortunately, have been overcome so far. Perhaps this post will be useful to those who are also just getting started with Apache projects.
So to the point. Inspired
Since the usual instructions for running Airflow do not seem to apply in a Windows environment, but rather use
Let's go through the steps of the instructions (spoiler - everything went fine on the 5th step):
1. Installing the Windows Subsystem for Linux for later installation of Linux distributions
This is the least of the problems, as they say:
Control Panel β Programs β Programs and Features β Turn Windows features on or off β Windows Subsystem for Linux
2. Install the Linux distribution of your choice
I used the app
3. Install and update pip
sudo apt-get install software-properties-common
sudo apt-add-repository universe
sudo apt-get update
sudo apt-get install python-pip
4. Installing Apache Airflow
export SLUGIFY_USES_TEXT_UNIDECODE=yes
pip install apache-airflow
5. Database initialization
And this is where my little difficulties began. The instruction instructs to enter the command airflow initdb
and move on to the next step. However, I always received an answer airflow: command not found
. It is logical to assume that there were difficulties during the installation of Apache Airflow and there are simply no necessary files. After making sure that everything is where it should be, I decided to try to specify the full path to the airflow file (it should look like this: ΠΠΎΠ»Π½ΡΠΉ/ΠΏΡΡΡ/Π΄ΠΎ/ΡΠ°ΠΉΠ»Π°/airflow initdb
). But the miracle did not happen and the answer was the same airflow: command not found
. I tried using a relative path to the file (./.local/bin/airflow initdb
), which resulted in a new error ModuleNotFoundError: No module named json'
, which can be overcome by updating the library werkzeug (in my case up to version 0.15.4):
pip install werkzeug==0.15.4
You can read more about werkzeug
After this simple manipulation, the command ./.local/bin/airflow initdb
was completed successfully.
6. Starting the Airflow Server
On this, the difficulties with accessing airflow have not yet ended. Run command ./.local/bin/airflow webserver -p 8080
led to an error No such file or directory
. It is likely that an experienced Ubuntu user would immediately try to overcome such difficulties with accessing a file by using the command export PATH=$PATH:~/.local/bin/
(that is, by adding /.local/bin/ to the existing PATH executable search path), but this post is for those who work primarily with Windows and may not think this is an obvious solution.
After the manipulation described above, the command ./.local/bin/airflow webserver -p 8080
has been successfully completed.
7.URL:
If everything went well in the previous stages, then you are ready to conquer the analytical heights.
I hope that the experience of installing Apache Airflow on Windows 10 described above will be useful for novice users and will speed up their entry into the universe of modern analytics tools.
Next time I would like to continue the topic and talk about the experience of using Apache Airflow in the field of analyzing the behavior of users of mobile applications.
Source: habr.com