One does not simply productionalize machine learning models.

If you’ve developed models before, you know that most of the time input features need to be preprocessed before they are ready to be consumed by a model. Often this preprocessing step is done by another application before sending the processed output to the prediction engine. This adds a layer of complexity when productizing machine learning models and makes integrations difficult.

Partner with aster.cloud
for your next big idea.
Let us know here.

From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.

CYBERPOGO.COM :: For the Arts, Sciences, and Technology.

DADAHACKS.COM :: Parenting For The Rest Of Us.

ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.

TAKUMAKU.COM :: For The Hearth And Home.

ASTER.CLOUD :: From The Cloud And Beyond.

LIWAIWAI.COM :: Intelligence, Inside and Outside.

GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.

FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.

ASTERCASTER.COM :: Supra Astra. Beyond The Stars.

BARTDAY.COM :: Prosperity For Everyone.

Fortunately, TensorFlow supports preprocessing layers which can be attached to any model so that preprocessing and prediction can all be performed in the same application.

In this post, we’ll download a vision model from TensorFlow Hub, attach an image preprocessing function and upload it to Vertex’s prediction service, which will host our model in the cloud and let us make predictions with it through a REST endpoint. Not only does this make app development easier, but it also provides more flexibility and ease of use; Clients can send their data as-is and let the model do the heavy lifting. Furthermore, Vertex’s prediction service lets us take advantage of hardware like GPUs and performs model monitoring and autoscaling.

Prefer doing everything in code from a Jupyter Notebook? Check out this colab.

Download a model from TensorFlow Hub

On https://tfhub.dev/ you’ll find lots of free models that process audio, text, video, and images. In this post, we’ll grab the CenterNet Object and Keypoints detection model. This model takes as input an image and returns object detection bounding boxes and detection keypoints. Detection keypoints are used to detect object parts, such as human body parts and joints.

On the CenterNet Object and Keypoints detection model page click “Download” to grab the model in TensorFlow’s SavedModel format. You’ll download a zipped file that contains a directory formatted like so:

-saved_model.pb
-variables
    -variables.data-00000-of-00001
    -variables.index

Here the saved_model.pb file describes the structure of the saved neural network, and the data in the variables folder contains the network’s learned weights.

On the model’s hub page, you can see its example usage:

You feed the model an input Tensor and it spits out a dictionary with the number of detection objects, the localized box and keypoint coordinates. Unfortunately, this model only supports Tensor-like objects such as tf.tensors and Numpy arrays. This makes it quite difficult for clients as it requires them to create pre-processing logic before sending the inputs to the model. More so in other languages such as Javascript and Java.

TensorFlow preprocessing layers

TensorFlow models contain a signature definition which defines the signature of a computation supported in a TensorFlow graph. SignatureDefs aim to provide generic support to identify inputs and outputs of a function. If you’ve got TensorFlow installed on your computer, in the directory of the Hub model you downloaded, run:

saved_model_cli show --dir . --tag_set serve --signature_def serving_default

For this model, that command outputs:

We can modify this input layer with a preprocessing function so that clients can use base64 encoded images, which is a standard way of sending images through RESTFUL APIs. To do that, we’ll save a model with new serving signatures. The new signatures use python functions to handle preprocessing the image from a JPEG to a Tensor.

MODEL_W_PROCESSING= "model_with_processing/"
def _preprocess(bytes_inputs):
    decoded = tf.io.decode_jpeg(bytes_inputs, channels=3)
    resized = tf.image.resize(decoded, size=(512, 512))
    return tf.cast(resized, dtype=tf.uint8)

def _get_serve_image_fn(model):
    @tf.function(input_signature=[tf.TensorSpec([None], tf.string)])
    def serve_image_fn(bytes_inputs):
        decoded_images = tf.map_fn(_preprocess, bytes_inputs, dtype=tf.uint8)
        return model(decoded_images)
    return serve_image_fn

signatures = {
    "serving_default": _get_serve_image_fn(model).get_concrete_function(
        tf.TensorSpec(shape=[None], dtype=tf.string)
    )
}

tf.saved_model.save(model, MODEL_W_PREPROCESSING, signatures=signatures)

The base64 decoding is done natively by Vertex’s prediction endpoint service. More on that next.

Getting started with Vertex AI

Vertex AI is Google Cloud’s new platform for training, deploying and monitoring machine learning models and pipelines.

For this project we’ll use the prediction service, which will wrap our model in a convenient REST endpoint.

To get started, you’ll need a Google Cloud account with a GCP project set up. Next, you’ll need to create a Cloud Storage Bucket which is where you’ll upload the TensorFlow Hub model. You can do this from the command line using gsutil.

gcloud init # Sign in to your google cloud account/project
gsutil mb mb -l us-central1 gs://my-model-bucket # Create a new storage bucket
gsutil cp -r model_with_preprocessing gs://my-model-bucket/model_with_preprocessing #upload the model

If this model is big, this could take a while!

In the side menu, enable the Vertex AI API.

Once your Hub model is uploaded to Cloud Storage, it’s straightforward to import and deploy the model into Vertex AI using the Google Cloud aiplatform python SDK and gcloud cli commands.

You can use python’s standard package manager, pip, to install the SDK on your machine.

pip install –upgrade google-cloud-aiplatform

To install the gcloud cli, follow the steps outlined in this page based on your environment.

To create the model in Vertex AI, run the following through command line.

gcloud ai models upload \
--region=us-central1 \
--project=$PROJECT_ID \
--display-name=object-detection \
--container-image-uri=us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-5:latest \
--artifact-uri=$BUCKET_NAME/model_with_preprocessing

Don’t forget to change the PROJECT_ID and BUCKET_NAME values.

Once this is done, you can create a Vertex AI endpoint using the gcloud cli.

gcloud ai endpoints create \
--project=$PROJECT_ID \
--region=$REGION \
--display-name=object-detection-endpoint

Now that we have uploaded the model and created the Vertex AI endpoint, let’s grab the model and endpoint ids. These will be used for deployment.

The model id can be found from the Vertex AI console under Models.

Similarly, the endpoint id is found under Endpoints.

Optionally, you can run the following command to fetch these values from the command line.

MODEL_ID=`gcloud ai models list --region=$1 --project=$2 | grep object-detection`
MODEL_ID=$MODEL_ID | cut -d' ' -f1 | tr -d '\n'
ENDPOINT_ID=`gcloud ai endpoints list --region=$1 --project=$2 | sed -n 2p`
ENDPOINT_ID=$ENDPOINT_ID | cut -d' ' -f1 | tr -d '\n'

With these values set, we deploy the model to the endpoint.

gcloud ai endpoints deploy-model $ENDPOINT_ID \
--project=$PROJECT_ID \
--region=$REGION \
--model=$MODEL_ID \
--display-name=object-detection-endpoint \
--traffic-split=0=100

This can take a while as Vertex AI provides a machine to serve the model.

Making predictions

When uploading is finished, we can start making predictions against our model. Vertex AI endpoints accept POST requests with a JSON body. We’ll base 64 encode our image and send it to the model in the JSON body. If the body contains the “b64” element inside, then it knows to decode the image before passing it to the model.

Let’s try this out. Download an image from the web and save it in your local environment. Make sure the image is smaller than 1.5 megabytes (As of March 1, 2022, Vertex AI public endpoints impose request limits of this size in order to keep containers from crashing during heavy load times). Once you have an image ready, you can create a request body using the command line.

echo {"\""instances"\"" : [{"\""bytes_inputs"\"" : {"\""b64"\"" : "\""$(base64 "image2.jpg")"\""}}]} > instances.json

To test our new endpoint, we can use curl to call the endpoint.

curl POST  \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/$PROJECT_ID/locations/us-central1/endpoints/$ENDPOINT_ID:predict \
-d @instances.json > results.json

The resulting JSON contains the response spec described in the CenterNet Object and Keypoints detection model page.

The tensorflow models github repository has Python libraries to visualize these results onto the images. If you would like to, please take a look this colab.

Now that we’ve set our TensorFlow Hub model on Vertex, we can use it in our app without having to think about (most of) the performance and ops challenges of using big machine learning models in production. It’s a nice serverless way to get building with AI fast.

By: Juan Acevedo (Enterprise AI/ML Customer Engineer) and Dale Markowitz (Applied AI Engineer)
Source: Google Cloud Blog