Cloud Storage is a common choice for Vertex AI and AI Platform users to store their training data, models, checkpoints and logs. Now, with Cloud Storage FUSE, training jobs on both platforms can access their data on Cloud Storage as files in the local file system.

This post introduces the Cloud Storage FUSE for Vertex AI Custom Training. On AI Platform Training, the feature is very similar.

Partner with aster.cloud
for your next big idea.
Let us know here.

From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.

CYBERPOGO.COM :: For the Arts, Sciences, and Technology.

DADAHACKS.COM :: Parenting For The Rest Of Us.

ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.

TAKUMAKU.COM :: For The Hearth And Home.

ASTER.CLOUD :: From The Cloud And Beyond.

LIWAIWAI.COM :: Intelligence, Inside and Outside.

GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.

FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.

ASTERCASTER.COM :: Supra Astra. Beyond The Stars.

BARTDAY.COM :: Prosperity For Everyone.

Cloud Storage FUSE provides 3 benefits over the traditional ways of accessing Cloud Storage:

Training jobs can start quickly without downloading any training data.
Training jobs can perform I/O easily at scale, without the friction of calling the Cloud Storage APIs, handling the responses, or integrating with client-side libraries.
Training jobs can leverage the optimized performance of Cloud Storage FUSE.

The problems

Traditionally, training jobs have two ways to use data from Cloud Storage.

They can use gsutil to download the entire dataset prior to training. This may take hours depending on the dataset size, which significantly slows down the start-up of the jobs.
They can call Cloud Storage APIs directly or from a client library integrated. This way greatly adds complexity to the training code and thus the cost for development and maintenance.

Cloud Storage FUSE

Cloud Storage FUSE is a File System in User Space (FUSE) mounted on Vertex AI systems.

When you start a custom training job, the job sees a directory /gcs which contains all the Cloud Storage buckets as subdirectories. The job can visit the subdirectories (ie. buckets) when certain permissions are granted.

For instance, training jobs can read from file /gcs/example-bucket/data.csv to get the training data stored in object gs://example-bucket/data.csv

with open('/gcs/example-bucket/data.csv', 'r') as f:
  lines = f.readlines()

Training jobs can also write to the bucket:

with open('/gcs/example-bucket/epoch3.log', 'a') as f:
  f.write('success!\n')

Permissions

Users can assign service accounts to the training jobs to configure their permissions for the Cloud Storage buckets.

If the training job is assigned without a service account, it is allowed to access all the buckets owned by the same project.
If the training job is assigned with a service account that has Cloud Storage Roles, it has the permissions given by the roles.

For instance, you may create a service account as

storage.objectAdmin to bucket A, and
storage.objectViewer to bucket B.

If you assign it to your training job, your training job will be able to

read and write in bucket A, and
read only in bucket B.

The training job will fail with error “permission denied” if it tries to write to bucket B.

Performance

The I/O is often a bottleneck for training jobs with large datasets. Here are some tips to improve the read throughput of the Cloud Storage FUSE:

Store data in large files to reduce the number of files used in the training. Fewer files mean less lookup overhead in locating and opening objects in Cloud Storage.
Use multiple threads. Higher concurrency utilizes the bandwidth better.
Keep the files warm. Files to be accessed frequently (ie warm) are generally better cached and have better performance being read.

Restrictions

Cloud Storage FUSE is not a POSIX compliant file system. Therefore, some usage in a POSIX file system would have unwanted results, which should be avoided.

Directories:

The root directory `/gcs` is not readable. If you run ls /gcs, you will get an “Input/output error”. However, it is okay to read the bucket root such as ls /gcs/example-bucket.
Renaming a directory is not atomic. A renaming operation interrupted would leave a partial result with some files in the new directory, while others in the old directory. A directory with too many direct and indirect files cannot be renamed.

Files:

Hard links are not supported.
File metadata such as ownership, permissions, mtime, extended attributes, are not supported. Do not rely on file metadata for training logic.
Flushing files pushes the entire file to Cloud Storage, which is expensive. Closing a file leads to a flush. Therefore, one should avoid frequent file closes and flushes.
Concurrent write to a file would lead to data corruption.

Logs

You can find the logs from Cloud Storage FUSE to help you diagnose the errors in training.

First, you follow the link to the Cloud Log Explorer on the training job’s page in Pantheon. In the explorer, you can run queries to inspect the logs generated from your training job.
Second, you can view the logs with “gcsfuse” in the resource.labels.taskName property. For instance, the task name “workerpool0-0.gcsfuse” indicates the log is from the Cloud Storage FUSE mounted for the first worker “0” in the first worker pool “workerpool0”.

What’s next

You can find more information on Cloud Storage Fuse in documentation:

You can also find code samples using Cloud Storage FUSE for Vertex AI Custom Training:

https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/master/community-content

By: Oliver Zhuang (Software Engineer)
Source: Google Cloud Blog

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

Cloud Storage As A File System In AI Training

From our partners:

The problems

Cloud Storage FUSE

Permissions

Performance

Restrictions

Logs

What’s next

aster.cloud

IBM Study: One in Four Malicious Breaches are AI-Enabled, Costing Companies $6 Million on Average

Accelerating the frontiers of scientific discovery: Google’s $40M commitment to the Genesis Mission

3 Questions: Neural transparency and the future of AI design

Intel Invests €5 Billion to Expand Manufacturing in Europe

IBM and Red Hat Expand Lightwell with New Offerings to Build the Trust Infrastructure for AI-Era Open Source

When I Was Young

The Fastest AI Fried Chicken In The World

Zed Approves | How to Stay Cool in Extreme Heat

The AI investment surge hasn’t produced the expected results yet. That could change in 2026

Zed Approves | It’s Prime Day 2026! Time to Upgrade Your World Cup Viewing Setup and Beat the Heat

Most Popular

Zed Approves | The Best Prime Day PC Deals: Top Gaming Rigs, Workstations, and Everyday Laptops

Zed Approves: How to Gear Up for GTA 6 This Amazon Prime Day (2026 Quick Guide)

Father’s Day Outdoors – Build Dad the Ultimate Backyard Watch Party

Father’s Day Outdoors, Round Two – Gear for the Action, the Tailgate, and Beating the Heat

The Ultimate Father’s Day Gift Guide – Home Entertainment Upgrades Dad Actually Wants

Cloud Storage As A File System In AI Training

From our partners:

The problems

Cloud Storage FUSE

Permissions

Performance

Restrictions

Logs

What’s next

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Related Topics

You May Also Like