Managing experiments is one of the main challenges for data science teams.

Finding the best modeling approach that works for a particular problem requires both hypothesis testing and trial-and-error. Tracking development and outcomes using docs and spreadsheets is neither reliable nor easy to share. Consequently the process of ML development is severely affected.

Indeed, not having a tracking service leads to manual copy/pasting of the parameters and metrics. With an increasing number of experiments, a model builder won’t  be able to reproduce the data and model configuration that was used to train models. Consequently the model’s predictive behavior and performance changes cannot be verified.

This lack of information is even more impactful when you have different teams involved in several use cases.

At scale, the steps of an ML experiment need to be orchestrated using pipelines. But how can data science teams guarantee rapid iteration of experiments and better readiness at the same time without the benefit of having a centralized location to manage and validate the results?

Bottom line is that  it is  much harder to turn your model into an asset for the company and its business.

To address these challenges, we are excited to announce the general availability of Vertex AI Experiments, the managed experiment tracking service on Vertex AI.

Vertex AI Experiments is designed not only for tracking but for supporting seamless experimentation. The service enables you to track parameters, visualize and compare the performance metrics of your model and pipeline experiments. At the same time, Vertex AI Experiments provides an experiment lineage you can use to represent each step involved in arriving at the best model configuration.

In this blog, we’ll dive into how Vertex AI Experiments works, showcasing the features that enable you to:

  • track parameters and metrics of models trained locally using the Vertex AI SDK
  • create experiment lineage (for example, data preprocessing, feature engineering) of experiment artifacts that others within your team can reuse
  • record the training configuration of several pipeline runs

But before we dive deeper, let’s clarify what a Vertex AI Experiment is.

Run, Experiment and the Metadata service

In Vertex AI Experiments, you have two main concepts : run and experiment.

Runs are associated with  a particular training configuration that a data scientist took while solving a particular ML challenge. For each run you can record:

  • Parameters as key-value inputs of the run,
  • Summary and time-series metrics (the metrics recorded at the end of each epoch) as key-value outputs of the run,
  • Artifacts including input data, transformed datasets and trained models.

Because you may have multiple runs, you can organize them into an experiment, which is the top level container for everything that a practitioner does to solve a particular data science challenge.

Figure 1. Run, Experiment and the Metadata service


Notice that both runs and experiments leverages Vertex ML Metadata, which is a managed ML Metadata store based on the open source ML Metadata (MLMD) library developed by Google’s TensorFlow Extended team. It lets you record, analyze, debug, and audit metadata and artifacts produced during your ML journey. In the case of Vertex AI Experiments, it allows you to visualize the ML lineage of your ML experiment.

Now that you know what a Vertex AI Experiment is, let’s see how you can leverage its capabilities to address the potential challenges of tracking and managing your experiments at scale.

Comparing models trained and evaluated locally

As a Data scientist,  you probably will start training your model locally. To find the optimal modeling approach,  you would like to try out different configurations.

For example, if you are building a TensorFlow model, you would want to track data parameters  such as `buffer_size` or the `batch_size` of the , and model parameters such as layer name, the `learning_rate` of the optimizer , and `metrics` you want to optimize.

As soon as you try several configurations ,  you will need to evaluate the resulting model by generating metrics for further analysis.

With Vertex AI Experiments, you can easily create an experiment and log both parameters, metrics, and artifacts that are associated with your experiment runs by using both the Vertex AI section of the Google Cloud console and the Vertex AI Python SDK.

Figure 2. Create an experiment (console)


Notice also that the SDK provides a handy initialization method that allows you to create a TensorBoard instance using Vertex AI TensorBoard for logging model time series metrics. Below you can see how to start an experiment, log model parameters and track evaluation metrics both per epoch and at the end of the training session.

# Create Tensorboard instance and initialize Vertex AI client
vertex_ai_tb = vertex_ai.Tensorboard.create()
vertex_ai.init(experiment=my_tensorflow_experiment, experiment_tensorboard=vertex_ai_tb)

# Initialize the experiment
with vertex_ai.start_run("run-1") as run:

   # Log model parameters and build the model
   run.log_params({"learning_rate": 0.01})
   model = get_model(model_params=model_params)

   # Train the model
   history = train(model=model,

    # Log metrics recorded at the end of each epoch 
    for idx in range(0, history.params["epochs"]):
                "train_loss": history.history["loss"][idx],
                "train_accuracy": history.history["accuracy"][idx]})

    # Evaluate model and log evaluation metrics
    test_loss, test_accuracy = model.evaluate(test_dataset)
    vertex_ai.log_metrics({"test_loss": test_loss, "test_accuracy": test_accuracy})


Then you can analyze the result of the experiment by viewing the Vertex AI section of the Google Cloud console or by retrieving them in the notebook. This video shows what it would look like.

Figure 3. Compare models trained and evaluated locally with Vertex AI Experiments


Tracking model training experiment lineage

The model training is just a single step in an experiment. Some data preprocessing is also required that  others within your team may have written. For that reason, you need a way to easily integrate preprocessing steps and record the resulting dataset to reuse it along several experiment runs.

By leveraging the integration with Vertex ML Metadata, Vertex AI Experiments allows you to track the data preprocessing as part of the experiment lineage by running an Vertex ML Metadata execution in your experiment context. Here you can see how to use execution to integrate preprocessing code in a Vertex AI Experiments.

# Create the dataset artifact
raw_dataset_artifact = vertex_ai.Artifact.create(

# Initiate the preprocessing execution 
with vertex_ai.start_execution(schema_title="system.ContainerExecution",
    ) as exc:

    # Assign raw dataset as input artifact

    # Log preprocessing params
    vertex_ai.log_params({"delimiter": ",", 
                          "target_name": "target"})

    # Preprocessing 
    raw_df = pd.read_csv(raw_dataset_artifact.uri)
    preprocessed_df = your_preprocess_function(raw_df)

    # Log preprocessing metrics
    vertex_ai.log_metrics({"n_records": preprocessed_df.shape[0],"n_columns": preprocessed_df.shape[1]})

    # Record the preprocessed dataset as output artifact
    preprocessed_dataset_metadata = vertex_ai.Artifact.create(

Once the execution is instantiated, you start recording the data preprocessing step. You can assign the dataset as input artifact, consume the dataset in the preprocessing code and pass the preprocessed dataset as output artifact of the execution. Then, that preprocessing step and its dataset, are automatically recorded as part of the experiment lineage and they are ready to be consumed as input artifacts of different training run executions associated with the same experiment. This is how the training execution would look like with the resulting model uploaded as an model artifact after training successfully finished.

# Initiate the train execution
with vertex_ai.start_execution(schema_title="system.ContainerExecution", display_name="train") as exc:


    # Log data parameters
    vertex_ai.log_params({"target_name": "target", 
                          "test_size": 0.2, "random_state": 8})

    # Get training and testing data
    x_train, x_val, y_train, y_val = get_training_split(preprocessed_df[["feature1","feature2","feature3"]], preprocessed_df["target"], test_size=0.2, random_state=8)
    # Get and train model pipeline 
    pipeline = get_pipeline()
    trained_pipeline = train_pipeline(pipeline, x_train, y_train)

    # Evaluate model and log training metrics
    model_metrics = evaluate_model(trained_pipeline, x_val, y_val)

    # Upload Model
    loaded = save_model(trained_pipeline, "gs://vertex-ai-experiments-demo/model.joblib")

    #Record the trained model as output artifact
    model = vertex_ai.Model.upload(


Below you can see how to access data and model artifacts of an experiment run from the Vertex AI Experiments view and how the resulting experiment lineage would look like in the Vertex ML Metadata.

Figure 4. Tracking model training experiment  lineage


Comparing model training pipeline runs

Automating experimentation of a pipeline run is essential when you need to retrain your models frequently. The sooner you formalize your experiments in pipelines, the easier and faster it will be to move them to production. The diagram depicts a high level view of a rapid experimentation process.

Figure 5. Rapid experimentation process

As a data scientist, you formalize your experiment in a pipeline which will take in a number of parameters to train your model. Once you have your pipeline,  you need a way to track and evaluate pipeline runs at scale to determine which parameters configuration generates the best  performing model.

By leveraging the integration with Vertex AI Pipelines, Vertex AI Experiments lets you to track pipeline parameters, artifacts and metrics and compare pipeline runs.

All you need to do is declare the experiment name before submitting the pipeline job on Vertex AI.

# Define run specifications
runs = [
    {"max_depth": 4, "learning_rate": 0.2, "boost_rounds": 10},
    {"max_depth": 5, "learning_rate": 0.3, "boost_rounds": 20},
    {"max_depth": 3, "learning_rate": 0.1, "boost_rounds": 30},

# Submit multiple pipelines and track them as experiment runs
for i, parameter_values in enumerate(runs):

    job = vertex_ai.PipelineJob(


Then, as demonstrated below, you will be able to see your pipeline experiment run and its parameters and metrics in Vertex AI Experiments and compare it with previous runs to then promote the best training configuration to production. You can also see the relationship with your experiment run and monitor your pipeline run in Vertex AI Pipelines.  And because each run is mapped to a resource in Vertex ML Metadata, you will be able to explain your choice to others by showing the lineage automatically created on Vertex AI.

Figure 6. Track and comparing model training pipeline experiment runs



With Vertex AI Experiments you will be able not only to track parameters, visualize and compare performance metrics of your models, you will be able to build managed experiments that are ready to go to production quickly because of the ML pipeline and the metadata lineage integration capabilities of Vertex AI.

Now it’s your turn. While I’m thinking about the next blog post, check out notebooks in the official Github repo and the resources below to start getting your hands dirty. And remember…Always have fun!


By: Ivan Nardini (Customer Engineer) and May Hu (Product Manager, Google Cloud)
Source: Google Cloud Blog

Previous Making AI More Accessible For Every Business
Next Google Meet And Miro Come Together To Create Great Collaborative Moments