aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Engineering

Introducing Model Co-Hosting To Enable Resource Sharing Among Multiple Model Deployments On Vertex AI

  • aster.cloud
  • August 1, 2022
  • 3 minute read

When deploying models to the Vertex AI prediction service, each model is by default deployed to its own VM. To make hosting more cost effective, we’re excited to introduce model co-hosting in public preview, which allows you to host multiple models on the same VM, resulting in better utilization of memory and computational resources. The number of models you choose to deploy to the same VM will depend on model sizes and traffic patterns, but this feature is particularly useful for scenarios where you have many deployed models with sparse traffic.

Understanding the Deployment Resource Pool

Co-hosting model support introduces the concept of a Deployment Resource Pool, which groups together models to share resources within a VM. Models can share a VM if they share an endpoint, but also if they are deployed to different endpoints.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

For example, let’s say you have four models and two endpoints, as shown in the image below.

 

Model_A, Model_B, and Model_C are all deployed to Endpoint_1 with traffic split between them. And Model_D is deployed to Endpoint_2, receiving 100% of the traffic for that endpoint.

Instead of having each model assigned to a separate VM, we can group Model_A and Model_B to share a VM, making them part of DeploymentResourcePool_X. We can also group models that are not on the same endpoint, so Model_C and Model_D can be hosted together in DeploymentResourcePool_Y.

Note that for this first release, models in the same resource pool must also have the same container image and version of the Vertex AI pre-built TensorFlow prediction containers. Other model frameworks and custom containers are not yet supported.

Read More  Google Cloud Is A Leader In The Forrester Wave: Public Cloud Container Platforms, Q1 2022

Co-hosting models with Vertex AI Predictions

You can set up model co-hosting in a few steps. The main difference is that you’ll first create a DeploymentResourcePool, and then deploy your model within that pool.

Step 1: Create a DeploymentResourcePool

You can create a DeploymentResourcePool with the following command. There’s no cost associated with this resource until the first model is deployed.

 

PROJECT_ID={YOUR_PROJECT}
REGION="us-central1"
VERTEX_API_URL=REGION + "-aiplatform.googleapis.com"
VERTEX_PREDICTION_API_URL=REGION + "-prediction-aiplatform.googleapis.com"
MULTI_MODEL_API_VERSION="v1beta1"

# Give the pool a name
DEPLOYMENT_RESOURCE_POOL_ID="my-resource-pool"

CREATE_RP_PAYLOAD = {
 "deployment_resource_pool":{
   "dedicated_resources":{
     "machine_spec":{
       "machine_type":"n1-standard-4"
     },
     "min_replica_count":1,
     "max_replica_count":2
   }
 },
 "deployment_resource_pool_id":DEPLOYMENT_RESOURCE_POOL_ID
}
CREATE_RP_REQUEST=json.dumps(CREATE_RP_PAYLOAD)

!curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://{VERTEX_API_URL}/{MULTI_MODEL_API_VERSION}/projects/{PROJECT_ID}/locations/{REGION}/deploymentResourcePools \
-d '{CREATE_RP_REQUEST}'

 

Step 2: Create a model

Models can be imported to the Vertex AI Model Registry at the end of a custom training job, or you can upload them separately if the model artifacts are saved to a Cloud Storage bucket. You can upload a model through the UI or with the SDK using the following command:

 

# REPLACE artifact_uri with GCS path to your artifacts
my_model = aiplatform.Model.upload(display_name='text-model-1',
                                  artifact_uri=’gs://{YOUR_GCS_BUCKET}’,
                                  serving_container_image_uri='us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-7:latest')

 

When the model is uploaded, you’ll see it in the model registry. Note that the deployment status is empty since the model hasn’t been deployed yet.

 

Step 3: Create an endpoint

Next, create an endpoint via the SDK or the UI. Note that this is different from deploying a model to an endpoint.

endpoint = aiplatform.Endpoint.create('cohost-endpoint')

When your endpoint is created, you’ll be able to see it in the console.

 

Step 4: Deploy Model in a Deployment Resource Pool

The last step before getting predictions is to deploy the model within the DeploymentResourcePool you created.

Read More  Data Modernization With Google Cloud And MongoDB Atlas

 

MODEL_ID={MODEL_ID}
ENDPOINT_ID={ENDPOINT_ID}

MODEL_NAME = "projects/{project_id}/locations/{region}/models/{model_id}".format(project_id=PROJECT_ID, region=REGION, model_id=MODEL_ID)
SHARED_RESOURCE = "projects/{project_id}/locations/{region}/deploymentResourcePools/{deployment_resource_pool_id}".format(project_id=PROJECT_ID, region=REGION, deployment_resource_pool_id=DEPLOYMENT_RESOURCE_POOL_ID)

DEPLOY_MODEL_PAYLOAD = {
 "deployedModel": {
   "model": MODEL_NAME,
   "shared_resources": SHARED_RESOURCE
 },
 "trafficSplit": {
   "0": 100
 }
}
DEPLOY_MODEL_REQUEST=json.dumps(DEPLOY_MODEL_PAYLOAD)
pp.pprint("DEPLOY_MODEL_REQUEST: " + DEPLOY_MODEL_REQUEST)

!curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://{VERTEX_API_URL}/{MULTI_MODEL_API_VERSION}/projects/{PROJECT_ID}/locations/{REGION}/endpoints/{ENDPOINT_ID}:deployModel \
-d '{DEPLOY_MODEL_REQUEST}'

 

When the model is deployed, you’ll see it ready in the console. You can deploy additional models to this same DeploymentResourcePool for co-hosting using the same endpoint we created already, or using a new endpoint.

 

Step 5: Get a prediction

Once the model is deployed, you can call your endpoint in the same way you’re used to.

x_test= ['The movie was spectacular. Best acting I’ve seen in a long time and a great cast. I would definitely recommend this movie to my friends!'] endpoint.predict(instances=x_test)

What’s next

You now know the basics of how to co-host models on the same VM. For an end to end example, check out this codelab, or refer to the docs for more details. Now it’s time for you to start deploying some models of your own!

 

 

By: Nikita Namjoshi (Developer Advocate)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Artificial Intelligence
  • Google Cloud
  • Resource Management
  • Tutorials
  • Vertex AI
  • Virtual Machines
You May Also Like
View Post
  • Engineering

Just make it scale: An Aurora DSQL story

  • May 29, 2025
View Post
  • Engineering
  • Technology

Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

  • March 9, 2025
View Post
  • Computing
  • Engineering

Why a decades old architecture decision is impeding the power of AI computing

  • February 19, 2025
View Post
  • Engineering
  • Software Engineering

This Month in Julia World

  • January 17, 2025
View Post
  • Engineering
  • Software Engineering

Google Summer of Code 2025 is here!

  • January 17, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025
View Post
  • Computing
  • Design
  • Engineering
  • Technology

Here’s why it’s important to build long-term cryptographic resilience

  • December 24, 2024
IBM and Ferrari Premium Partner
View Post
  • Data
  • Engineering

IBM Selected as Official Fan Engagement and Data Analytics Partner for Scuderia Ferrari HP

  • November 7, 2024

Stay Connected!
LATEST
  • 1
    Just make it scale: An Aurora DSQL story
    • May 29, 2025
  • 2
    Reliance on US tech providers is making IT leaders skittish
    • May 28, 2025
  • Examine the 4 types of edge computing, with examples
    • May 28, 2025
  • AI and private cloud: 2 lessons from Dell Tech World 2025
    • May 28, 2025
  • 5
    TD Synnex named as UK distributor for Cohesity
    • May 28, 2025
  • Weigh these 6 enterprise advantages of storage as a service
    • May 28, 2025
  • 7
    Broadcom’s ‘harsh’ VMware contracts are costing customers up to 1,500% more
    • May 28, 2025
  • 8
    Pulsant targets partner diversity with new IaaS solution
    • May 23, 2025
  • 9
    Growing AI workloads are causing hybrid cloud headaches
    • May 23, 2025
  • Gemma 3n 10
    Announcing Gemma 3n preview: powerful, efficient, mobile-first AI
    • May 22, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • Understand how Windows Server 2025 PAYG licensing works
    • May 20, 2025
  • By the numbers: How upskilling fills the IT skills gap
    • May 21, 2025
  • 3
    Cloud adoption isn’t all it’s cut out to be as enterprises report growing dissatisfaction
    • May 15, 2025
  • 4
    Hybrid cloud is complicated – Red Hat’s new AI assistant wants to solve that
    • May 20, 2025
  • 5
    Google is getting serious on cloud sovereignty
    • May 22, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.