How to overcome fear and start using Azure Machine Learning

I know many Data Scientists - and I'm probably one of them myself - who work on GPU machines, local or virtual, located in the cloud, either through a Jupyter Notebook or through some kind of Python development environment. Working for 2 years as an AI/ML developer expert, I did exactly this, while preparing data on a regular server or workstation, and running training on a virtual machine with a GPU in Azure.

Of course, we've all heard about Azure Machine Learning — a special cloud platform for machine learning. However, after a first glance at introductory articles, it seems that Azure ML will create more problems for you than it solves. For example, in the tutorial mentioned above, training on Azure ML is launched from a Jupyter Notebook, while the training script itself is proposed to be created and edited as a text file in one of the cells - while not using auto-completion, syntax highlighting, and other advantages of a normal development environment. For this reason, we have not seriously used Azure ML in our work for a long time.

However, I recently discovered a way to start using Azure ML effectively in my work! Interested in the details?

How to overcome fear and start using Azure Machine Learning

The main secret is Visual Studio Code extension for Azure ML. It allows you to develop training scripts right in VS Code, taking full advantage of the environment - and you can even run the script locally and then simply send it to training in an Azure ML cluster with a few clicks. Convenient, isn't it?

In doing so, you get the following benefits from using Azure ML:

  • You can work most of the time locally on your machine in a convenient IDE, and use GPU only for model training. At the same time, the pool of training resources can automatically adjust to the required load, and by setting the minimum number of nodes to 0, you can automatically start the virtual machine "on demand" in the presence of training tasks.
  • You can store all learning outcomes in one place, including the achieved metrics and the resulting models - there is no need to come up with some kind of system or order to store all the results.
  • In this case, Multiple people can work on the same project - they can use the same computing cluster, all experiments will be queued up, and they can also see the results of each other's experiments. One such scenario is using Azure ML in teaching Deep Learningwhen instead of giving each student a virtual machine with a GPU, you can create one cluster that will be used by all centrally. In addition, a general table of results with model accuracy can serve as a good competitive element.
  • With Azure ML, you can easily conduct a series of experiments, for example, for hyperparameter optimization - this can be done with a few lines of code, there is no need to conduct a series of experiments manually.

I hope I convinced you to try Azure ML! Here's how to get started:

Azure ML Workspace and Azure ML Portal

Azure ML is organized around the concept work area — workspace. Data can be stored in the workspace, experiments are sent to it for training, training results are also stored there - the resulting metrics and models. You can see what is inside the workspace through Azure ML portal - and from there you can perform many operations, ranging from loading data to monitoring experiments and deploying models.

You can create a workspace through the web interface Azure portal (cm. step by step instructions), or using the Azure CLI command line (instructions):

az extension add -n azure-cli-ml
az group create -n myazml -l northeurope
az ml workspace create -w myworkspace -g myazml

Also associated with the workspace are some computing resources (Compute). Once you have created a script to train the model, you can send experiment for execution to the workspace, and specify compute target - in this case, the script will be packaged, run in the desired computing environment, and then all the results of the experiment will be saved in the workspace for further analysis and use.

Learning script for MNIST

Consider the classical problem handwritten digit recognition using the MNIST dataset. Similarly, in the future, you can run any of your training scripts.

There is a script in our repository train_local.py, which we train the simplest linear regression model using the SkLearn library. Of course, I understand that this is not the best way to solve the problem - we use it for an example, as the simplest.

The script first downloads the MNIST data from OpenML and then uses the class LogisticRegression to train the model, and then print the resulting accuracy:

mnist = fetch_openml('mnist_784')
mnist['target'] = np.array([int(x) for x in mnist['target']])

shuffle_index = np.random.permutation(len(mist['data']))
X, y = mnist['data'][shuffle_index], mnist['target'][shuffle_index]

X_train, X_test, y_train, y_test = 
  train_test_split(X, y, test_size = 0.3, random_state = 42)

lr = LogisticRegression()
lr.fit(X_train, y_train)
y_hat = lr.predict(X_test)
acc = np.average(np.int32(y_hat == y_test))

print('Overall accuracy:', acc)

You can run the script on your computer and get the result in a couple of seconds.

Run the script in Azure ML

If we run the training script through Azure ML, we will have two main advantages:

  • Running training on an arbitrary computing resource, which, as a rule, is more productive than the local computer. At the same time, Azure ML itself will take care of packing our script with all the files from the current directory into a docker container, installing the required dependencies, and sending it for execution.
  • Write results to a single registry inside an Azure ML workspace. To take advantage of this feature, we need to add a couple of lines of code to our script to record the resulting precision:

from azureml.core.run import Run
...
try:    
    run = Run.get_submitted_run()
    run.log('accuracy', acc)
except:
    pass

The corresponding version of the script is called train_universal.py (it is a little more cunning than it is written above, but not much). This script can be run both locally and on a remote computing resource.

To run it in Azure ML from VS Code, you need to do the following:

  1. Make sure the Azure Extension is connected to your subscription. Select the Azure icon from the menu on the left. If you are not connected, a notification will appear in the lower right corner (this is), by clicking on which you can enter through the browser. You can also click Ctrl-Shift-P to call the VS Code command line, and type Azure Sign In.

  2. After that, in the Azure section (icon on the left), find the section MACHINE LEARNING:

How to overcome fear and start using Azure Machine Learning
Here you should see different groups of objects inside the workspace: computing resources, experiments, etc.

  1. Go to the list of files, right click on the script train_universal.py and then Azure ML: Run as experiment in Azure.

How to overcome fear and start using Azure Machine Learning

  1. This will be followed by a series of dialogs in the command line area of ​​VS Code: confirm the subscription and Azure ML workspace you are using, and select Create new experiment:

How to overcome fear and start using Azure Machine Learning
How to overcome fear and start using Azure Machine Learning
How to overcome fear and start using Azure Machine Learning

  1. Choose to create a new compute resource Create New Compute:

    • Compute determines the computing resource on which training will take place. You can choose a local computer, or an AmlCompute cloud cluster. I recommend creating a scalable cluster of machines STANDARD_DS3_v2, with a minimum number of machines of 0 (and a maximum of 1 or more, depending on your appetites). This can be done through the VS Code interface, or previously through ML Portal.

    How to overcome fear and start using Azure Machine Learning

  2. Next, you need to select a configuration Compute Configuration, which defines the parameters of the container created for training, in particular, all the necessary libraries. In our case, since we are using Scikit Learn, we select SkLearn, and then just confirm the proposed list of libraries by pressing Enter. If you use any additional libraries, they must be specified here.

    How to overcome fear and start using Azure Machine Learning
    How to overcome fear and start using Azure Machine Learning

  3. This will open a window with a JSON file describing the experiment. In it, you can correct some parameters - for example, the name of the experiment. After that click on the link Submit Experiment right inside this file:

How to overcome fear and start using Azure Machine Learning

  1. After successfully submitting an experiment through VS Code, on the right side of the notification area, you will see a link to Azure ML Portal, where you can track the status and results of the experiment.

How to overcome fear and start using Azure Machine Learning
Subsequently, you can always find it in the section Experiments Azure ML Portal, or in the section Azure Machine Learning in the list of experiments:

How to overcome fear and start using Azure Machine Learning

  1. If after that you made some corrections to the code or changed the parameters, restarting the experiment will be much faster and easier. By right-clicking on a file, you will see a new menu item Repeat last run - just select it, and the experiment will immediately start:

How to overcome fear and start using Azure Machine Learning
You can always find the results of metrics from all launches on the Azure ML Portal, there is no need to write them down.

Now you know that running experiments with Azure ML is simple and painless, and you get a number of nice benefits in doing so.

But you can also see the disadvantages. For example, it took significantly longer to run the script. Of course, packaging the script in a container and deploying it on the server takes time. If at the same time the cluster was cut to a size of 0 nodes, it will take even more time to start the virtual machine, and all this is very noticeable when we experiment on simple tasks like MNIST, which are solved in a few seconds. However, in real life, when training lasts several hours, or even days or weeks, this additional time becomes insignificant, especially against the background of the much higher performance that a computing cluster can provide.

What's next?

I hope that after reading this article, you can and will use Azure ML in your work to run scripts, manage computing resources, and store results centrally. However, Azure ML can give you even more benefits!

Inside the workspace, you can store data, thereby creating a centralized repository for all your tasks, which is easy to access. In addition, you can run experiments not from Visual Studio Code, but using the API - this can be especially useful if you need to perform hyperparameter optimization and need to run the script many times with different parameters. Moreover, special technology is built into Azure ML Hyperdrive, which allows you to do more tricky search and optimization of hyperparameters. I will talk about these possibilities in my next post.

Useful resources

To learn more about Azure ML, you may find the following Microsoft Learn courses helpful:

Source: habr.com

Add a comment