Experience installing Apache Airflow on Windows 10

Preamble: by the will of fate, from the world of academic science (medicine), I ended up in the world of information technology, where I have to use my knowledge of the methodology for constructing an experiment and strategies for analyzing experimental data, however, to apply a new technology stack for me. In the process of mastering these technologies, I encounter a number of difficulties, which, fortunately, have been overcome so far. Perhaps this post will be useful to those who are also just getting started with Apache projects.

So to the point. Inspired article Yuri Emelyanov about the capabilities of Apache Airflow in the field of automation of analytical procedures, I wanted to start using the proposed set of libraries in my work. For those who are not yet familiar with Apache Airflow, a small overview may be of interest. article on the website of the National Library. N. E. Bauman.

Since the usual instructions for running Airflow do not seem to apply in a Windows environment, but rather use docker in my case it would be redundant, I started looking for other solutions. Fortunately for me, I was not the first on this path, so I managed to find a wonderful video instruction how to install apache airflow on windows 10 without using docker. But, as is often the case, when following the recommended steps, difficulties arise, and, I believe, not only for me. Therefore, I would like to talk about my experience with installing Apache Airflow, maybe it will save someone some time.

Let's go through the steps of the instructions (spoiler - everything went fine on the 5th step):

1. Installing the Windows Subsystem for Linux for later installation of Linux distributions

This is the least of the problems, as they say:

Control Panel β†’ Programs β†’ Programs and Features β†’ Turn Windows features on or off β†’ Windows Subsystem for Linux

2. Install the Linux distribution of your choice

I used the app Ubuntu.

3. Install and update pip

sudo apt-get install software-properties-common
sudo apt-add-repository universe
sudo apt-get update
sudo apt-get install python-pip

4. Installing Apache Airflow

export SLUGIFY_USES_TEXT_UNIDECODE=yes
pip install apache-airflow

5. Database initialization

And this is where my little difficulties began. The instruction instructs to enter the command airflow initdb and move on to the next step. However, I always received an answer airflow: command not found. It is logical to assume that there were difficulties during the installation of Apache Airflow and there are simply no necessary files. After making sure that everything is where it should be, I decided to try to specify the full path to the airflow file (it should look like this: ΠŸΠΎΠ»Π½Ρ‹ΠΉ/ΠΏΡƒΡ‚ΡŒ/Π΄ΠΎ/Ρ„Π°ΠΉΠ»Π°/airflow initdb). But the miracle did not happen and the answer was the same airflow: command not found. I tried using a relative path to the file (./.local/bin/airflow initdb), which resulted in a new error ModuleNotFoundError: No module named json', which can be overcome by updating the library werkzeug (in my case up to version 0.15.4):

pip install werkzeug==0.15.4

You can read more about werkzeug here.

After this simple manipulation, the command ./.local/bin/airflow initdb was completed successfully.

6. Starting the Airflow Server

On this, the difficulties with accessing airflow have not yet ended. Run command ./.local/bin/airflow webserver -p 8080 led to an error No such file or directory. It is likely that an experienced Ubuntu user would immediately try to overcome such difficulties with accessing a file by using the command export PATH=$PATH:~/.local/bin/ (that is, by adding /.local/bin/ to the existing PATH executable search path), but this post is for those who work primarily with Windows and may not think this is an obvious solution.

After the manipulation described above, the command ./.local/bin/airflow webserver -p 8080 has been successfully completed.

7.URL: localhost: 8080 /

If everything went well in the previous stages, then you are ready to conquer the analytical heights.

I hope that the experience of installing Apache Airflow on Windows 10 described above will be useful for novice users and will speed up their entry into the universe of modern analytics tools.

Next time I would like to continue the topic and talk about the experience of using Apache Airflow in the field of analyzing the behavior of users of mobile applications.

Source: habr.com

Add a comment