While many people still think of academic research when it comes to deep learning, Snap Inc. has been applying deep learning models to improve its recommendation engines on a daily basis. Using Google’s Cloud Tensor Processing Units (TPUs), Snap has accelerated its pace of innovation and model improvement to enhance the user experience.

Snap’s blog Training Large-Scale Recommendation Models with TPUs tells the story of how the Snap ad ranking team leveraged Google’s leading-edge TPUs to train deep learning models quickly and efficiently. But there’s a lot more to the story than the how, and that’s what we’re sharing here.

Partner with aster.cloud
for your next big idea.
Let us know here.

From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.

CYBERPOGO.COM :: For the Arts, Sciences, and Technology.

DADAHACKS.COM :: Parenting For The Rest Of Us.

ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.

TAKUMAKU.COM :: For The Hearth And Home.

ASTER.CLOUD :: From The Cloud And Beyond.

LIWAIWAI.COM :: Intelligence, Inside and Outside.

GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.

FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.

ASTERCASTER.COM :: Supra Astra. Beyond The Stars.

BARTDAY.COM :: Prosperity For Everyone.

Faster leads to better

Snap’s ad ranking team is charged with training the models that make sure the right ad is served to the right Snapchatter at the right time. With 300+ million users daily and millions of ads to rank, training models quickly and efficiently is a large part of a Snap ML engineer’s daily workload. It’s simple, really: the more models Snap’s engineers can train, the more likely they are to find the models that perform better—and the less it costs to do so. Better ad recommendation models translate to more relevant ads for users, driving greater engagement and improving conversion rates for advertisers.

Over the past decade, there has been tremendous evolution in the hardware accelerators used to train large ML models like those Snap uses for ad ranking, from general-purpose multicore central processing units (CPUs) to graphics processing units (GPUs) to TPUs.

TPUs are Google’s custom-developed application specific integrated circuits (ASICs) used to accelerate ML workloads. TPUs are designed from the ground up to minimize time to accuracy when training large models. Models that previously took weeks to train on other hardware platforms can now be trained in hours on TPUs—a product of Google’s leadership and experience in machine learning (dig into the technology in Snap’s blog).

Benchmarking success

Snap wanted to understand for itself what kind of improvements in training speed it might see using TPUs. So, the Snap team benchmarked model training using TPUs versus both GPUs and CPUs, and the results were impressive. GPUs underperformed TPUs in terms of both throughput and cost, with a reduction in throughput of 67 percent and an increase in costs of 52 percent when using GPUs. Similarly, TPU-based training drastically outperformed CPU-based training for Snap’s most common models. For example, when looking at their standard ad recommendation model, TPUs slashed processing costs by as much as 74 percent while increasing throughput by as much as 250 percent—all with the same level of accuracy.

Because TPU embedding API is a native and optimized solution for embedding-based operations, it performs embedding-based computations and lookups more efficiently. This is particularly valuable to recommenders, which have additional requirements such as fast embedding lookups and high memory bandwidth.

Benefits across the board

For Snap’s ad ranking team, those improvements translate into tangible workflow advantages. It’s not unusual for Snap to have a month’s worth of data that includes all the logs of users who were shown particular ads and a record of whether they interacted with an ad or not. That means it has millions of data points to process, and Snap wants to model them as quickly as possible so it can make better recommendations going forward. It’s an iterative process, and the faster Snap can get the results from one experiment, the faster its engineers can spin up another with even better results—and they’d much prefer to do that in hours rather than days.

Increased efficiency and velocity benefit Snapchatters, too. The better the models are, the more likely they are to correctly predict the likelihood that a given user will interact with a particular ad, improving the user experience and boosting engagement. Improved engagement leads to higher conversion rates and greater advertiser value—and given the volumes of ads and users Snap deals with, even a one percent improvement has real monetary impact.

Working at the leading edge

Snap is working hard to improve its recommendation quality with the goal of delivering greater value to advertisers and a better experience for Snapchatters. That includes going all-in on leading-edge solutions like Google TPUs that allow its talented ML engineers to shine.

Now that you know the whole story, see how Snap got there with the help of Google: Training Large-Scale Recommendation Models with TPUs.

By: Aymeric Damien (Machine Learning Engineer, Snap Inc.) and Samir Ahmed (Software Engineer, Snap Inc.)
Source: Google Cloud Blog

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

Snap Inc. Adopts Google Cloud TPU For Deep Learning Recommendation Models

From our partners:

Faster leads to better

Benchmarking success

Benefits across the board

Working at the leading edge

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

aster.cloud

Related Topics

Pure Accelerate 2025: All the news and updates live from Las Vegas

‘This was a very purposeful strategy’: Pure Storage unveils Enterprise Data Cloud in bid to unify data storage, management

What is cloud bursting?

There’s a ‘cloud reset’ underway, and VMware Cloud Foundation 9.0 is a chance for Broadcom to pounce on it

What is confidential computing?

Oracle adds xAI Grok models to OCI

Fine-tune your storage-as-a-service approach

Advanced audio dialog and generation with Gemini 2.5

A Father’s Day Gift for Every Pop and Papa

Global cloud spending might be booming, but AWS is trailing Microsoft and Google

Most Popular

Google Cloud, Cloudflare struck by widespread outages

What is PC as a service (PCaaS)?

Crayon targets mid-market gains with expanded Google Cloud partnership

By the numbers: Use AI to fill the IT skills gap

Apple services deliver powerful features and intelligent updates to users this autumn

Snap Inc. Adopts Google Cloud TPU For Deep Learning Recommendation Models

From our partners:

Faster leads to better

Benchmarking success

Benefits across the board

Working at the leading edge

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Related Topics

You May Also Like