Deploy BERT to Azure ML Studio (HuggingFace Transformers)

Azure ML Studio is a powerful platform that allows us to manage our ML/AI projects in various ways, from simple “Level 0” manual deployments to “Level 4”, fully automated MLOps.

In this scenario, we will deploy our HuggingFace BERT model running on the transformers library from a local folder to an online endpoint using Azure ML SDK v2.

The two main advantages of using Azure ML online endpoints are:

the ability to load-balance multiple deployments: you can either mirror or funnel traffic between different models in a high throughput environment (or in production) without affecting the traffic or causing downtime or delays in your app,
high observability: deployed containers come with Application Insights logging out of the box, so you don’t have to do anything for logging.

Prerequisites:

VSCode with Python notebook
Azure CLI (installed and signed in)
Docker Desktop
Azure ML Studio account with a workspace created

How-to:

For the most part, all you have to do is follow this tutorial, so I’m not going to repeat it. Instead, I will just show you the parts where things are a bit different.

Project structure

Create a folder for your deployment or a project (i.e., bert)
Inside that folder create a Python notebook, (i.e., deploy.ipynb) – this will be our workbook
Create folder called model and copy your model files into that folder.
Copy this Conda file into your model folder. Edit that file to specify desired Python and PIP versions, and add all required libraries, including transformers and pyTorch. This file will be used to set up your model’s environment. (At the time of writing, the deployment image has a bug, where it is missing a azureml-inference-server-http library, so you may want to add it to your conda file as well)
Copy this score.py file into your model folder. This file will be used to init() and run() your model when it’s deployed.
Copy this sample-request.json file into the root of your project. Edit it to provide desired request payload format for your model.

Configure deployment

When configuring deployment in your python notebook, provide your model folder and your conda.yaml file as follows:

model = Model(path="./model/")
env = Environment(
    conda_file="./model/conda.yaml",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
)

blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name=endpoint_name,
    model=model,
    environment=env,
    code_configuration=CodeConfiguration(
        code="./model/", scoring_script="score.py"
    ),
    instance_type="Standard_DS1_v2", # or "Standard_DS3_v2" etc
    instance_count=1,
)

Scoring script

The last thing you need to do is to modify your score.py script to initialize and run your BERT model using the transformers library. This should look something like this:

import os
import json
from transformers import AutoModelForTokenClassification, AutoTokenizer
from torch import no_grad

def init():
    """
    This function is called when the container is initialized/started, typically after create/update of the deployment.
    You can write the logic here to perform init operations like caching the model in memory
    """
    global model, tokenizer
    model = AutoModelFromTokenClassification.from_pretrained(os.path.dirname(__file__), local_files_only = True)
    tokenizer = AutoTokenizer.from_pretrained(os.path.dirname(__file__), local_files_only = True)
    logging.info("Init complete")


def run(raw_data):
    """
    This function is called for every invocation of the endpoint to perform the actual scoring/prediction.
    In the example we extract the data from the json input and call the scikit-learn model's predict()
    method and return the result back
    """
    # TODO: add your deserialization and validation guards here
    # TODO: add your inderence code here
    # TODO: add your return statement here

Once that’s done – you are ready to run your model locally. Once the local deployment has succeeded – give it a quick test with Postman, after which you are ready to publish!

Deploy BERT to Azure ML Studio (HuggingFace Transformers)