aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Data
  • Design
  • Engineering

Tokopedia’s Journey To Creating A Customer Data Platform (CDP) On Google Cloud Platform

  • aster.cloud
  • December 14, 2021
  • 6 minute read

Founded in 2009, Tokopedia is an ecommerce platform that enables millions of Indonesian to transact online. As the company grows, there is an urgent need to better understand customer’s behavior in order to improve the customer’s experience across the platform. Now, Tokopedia has more than 100 million Monthly Active Users and the demography and preferences of all these users are different. A way to meet their needs is through personalization.

Normally, a user needs to browse through thousands of products in order to find the item they are looking for. By creating product recommendations that are relevant to each user, we shorten their search journey and hopefully increase conversion early on in the journey. In order to build personalization, the Data Engineering Team’s Customer Data Platform (CDP) helped to gain access to user’s attributes. These attributes developed by the Data Engineering team come in handy for different use cases across functions and teams.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Previously, two main challenges were observed:

  1. The need for speed and answers caused an increase in data silos. As the needs for personalization increased across the company, different teams have been building their own personalization features. However, the limited time and the need to simplify communication across teams have resulted in the decision for each team to create their own data pipeline. This caused a few redundancies due to the development of similar data across different teams and these redundancies caused slower development time for new personalized feature, even though some of the attributes have been previously build in a different module.
  2. Inconsistent data definitions. As each team created their own data pipeline, there are many cases where each team had a different definition of a user’s attributes. On several occasions, this caused misunderstandings during meetings and unsynchronized user journeys due to different teams applying different attribute values to the same user. For example, team A evaluated user_id 001 as a woman in their 20s. Meanwhile, team B, having a different set of attributes and definitions evaluated user_id 001 as a woman in their 30s. These differences in definition and attributes can lead to different conclusions and results, consequently giving different personalizations. As a result, customers might be facing inconsistent experience during their journey in Tokopedia and have a bad experience during their activity. Imagine that you’re being displayed by one set type of content that is related with college necessities and then in a different module you’re being given a a content that is related to mom and baby.
Read More  Analyze Pacemaker Events Using Open Source Log Parser - Part 4

 

Previous State of Data Distribution

 

Currently, with CDP, different teams do not have to constantly rebuild the infrastructure. The same attributes will only need to be processed once, and can be used by different teams across the company. This optimizes the development time, cost, and effort. Another advantage of having CDP is the single definition of attributes across services and teams. Since different teams will be looking at the same attributes inside the CDP, this will reduce the chances of misunderstanding and strengthen synchronization between teams. This will give customers consistent experience across the Tokopedia platform and enable them to display relevant contents.

CDP High level Concept

 

Moreover, there are several key factors required in building the CDP platform in Tokopedia. The journey is as follows:

1. Define and Make a List of Attributes
During this phase, we work with the Product and Analyst teams to define all of the user’s attributes required to build the CDP. Our product team interviewed several stakeholders to understand different perspectives regarding user attributes. As a result, an initial attributes list was made to include gender, age group, location, etc. This process is done repetitively in order to have the best understanding of the user’s attributes.

 

2. Platform Design
After doing comprehensive reviews, we decided to build our CDP platform using several GCP tech stacks.

CDP Architecture

 

Bigquery was chosen as the analytics backend of our CDP self-service. Meanwhile, Google Cloud BigTable was selected as the backend, where our services will interact to enable the personalization. In developing the storage for Big Table, the design of the scheme is very important. The frequency and categorization will affect how we design the column qualifier while the CDP attribute will affect how we design the row key.

We also opted to create a caching mechanism to reduce the load to big tables for similar read activity. We build the cache system using redis with certain Time to Live (TTL) to ensure an optimized performance. In addition, we also applied a Role Based Access Control (RBAC) mechanism on the CDP API to ensure access control of different services towards attributes in the CDP.

Read More  Scaling Machine Learning Inference With NVIDIA Tensorrt And Google Dataflow

3. Monitoring and alerting
Another important point in building a CDP is developing the correct monitoring and alerting system to maintain stability on our platform. A soft and hard threshold on each metric is established and monitored. Once this threshold is reached, some alerts will be sent through the communication channel. Based on the current architecture, there are several parts in which we need to enable monitoring and alerting.

  • Data Pipeline
    One of the things that we will need to monitor is resource consumption during computation and data pipeline from data sources to the CDP storages, as we operate using Bigquery and Dataflow for Data Computation and Data Pipeline. In Bigquery, we need to monitor the slot utilization that is used to compute some data aggregation or manipulation to produce the attribute.
  • Data Quality
    When building the CDP, high quality data was important in order for it to be a trusted platform. Several metrics that are important in terms of data quality are Data Completeness, Data Validity, Data Anomaly and Data Consistency. Therefore, several monitoring needs to be enabled to ensure these metrics.
  • Storage and API Performance
    Since CDP’s backend and API directly interact with several front facing features, we have to ensure the availability of the CDP service. Since we’re using Big Table as the backend, the monitoring of CPU, Latency and RPS is required. This metric, by default, is provided in the Bigtable monitoring.

4. Discoverability across company
Many users have been inquiring on how they can browse attributes that our CDP offers. Initially, we started out by documenting our attributes and sharing it to our stakeholders. However, as the number of the attributes increased, it became increasingly harder for people to go through our documentation. This pushed us to start integrating the CDP terminology into our Data Catalog. In this case, our Data Catalog plays an important role in enabling users to browse attributes in CDP, including the definition of each attribute and how they can retrieve the data.

5. Implementation and adoption of the platform
Another key point for a successful CDP implementation is collaboration across teams on the front end services. There are several types of CDP implementation in Tokopedia: Personalization, Marketing Analytics, and Self Service Analytics.

  • Personalization
    The most common usage of CDP would be in personalizing a user’s journey. One example of personalization is the search feature. The product team personalizes the user’s search result based on the user’s address, so that the user will be able to find products that are in proximity to their location. After discussing the definition of user address, we created a CDP API contract with the Search team, so the development can run in parallel. As a result, today our users are able to have a better user experience based on their location.
  • Marketing Analytics
    When we started building the CDP platform, we discussed with the Marketing team on their existing use cases. One of their goals was to personalize and optimize marketing efforts, such as sending out notifications to the right user based on the user’s attributes to reduce unnecessary notification costs to unrelated users, and to enhance the overall user experience by avoiding spam notifications. Once we understood their needs, we looked at the ways in which CDP could cater to those needs. We discussed with the relevant team on how to integrate the segmentation engine and communication channel towards the CDP platform, the type of user attributes to use when sending marketing push/notifications, and how to integrate it with the segmentation engine and communication channel of the CDP platform.
  • Self-Service Analytics
    CDP also often uses self-service analytics to enable quick insights on user demographics and behavior in certain segments. To build this self-serve analytics tool, our team consulted with the Product and Analyst teams to define the user demographics’ attributes that business/product users often select for insights. After understanding the attributes required, we discussed with the Business Intelligence team to enable the visualization for the end user. This allowed different teams to understand our users better and gain insights on how we can improve our platform.
Read More  Tips And Tricks For Using New RegEx Support In Cloud Logging

CDP implementation has created a significant impact on different use cases and helped Tokopedia to be a more data-driven company. Through CDP, we are also able to strengthen one of our core DNA, which is Focus on Consumer. By sharing the CDP framework, we hope to bring value and help others to more easily create a thriving CDP platform.

 

 

By: Kent Stanley (Data Engineering Lead , Tokopedia)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Data Analytics
  • Google Cloud
  • Tokopedia
You May Also Like
View Post
  • Engineering

Just make it scale: An Aurora DSQL story

  • May 29, 2025
Getting things done makes her feel amazing
View Post
  • Computing
  • Data
  • Featured
  • Learning
  • Tech
  • Technology

Nurturing Minds in the Digital Revolution

  • April 25, 2025
View Post
  • Engineering
  • Technology

Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

  • March 9, 2025
View Post
  • Computing
  • Engineering

Why a decades old architecture decision is impeding the power of AI computing

  • February 19, 2025
View Post
  • Engineering
  • Software Engineering

This Month in Julia World

  • January 17, 2025
View Post
  • Engineering
  • Software Engineering

Google Summer of Code 2025 is here!

  • January 17, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025
View Post
  • Computing
  • Design
  • Engineering
  • Technology

Here’s why it’s important to build long-term cryptographic resilience

  • December 24, 2024

Stay Connected!
LATEST
  • 1
    Just make it scale: An Aurora DSQL story
    • May 29, 2025
  • 2
    Reliance on US tech providers is making IT leaders skittish
    • May 28, 2025
  • Examine the 4 types of edge computing, with examples
    • May 28, 2025
  • AI and private cloud: 2 lessons from Dell Tech World 2025
    • May 28, 2025
  • 5
    TD Synnex named as UK distributor for Cohesity
    • May 28, 2025
  • Weigh these 6 enterprise advantages of storage as a service
    • May 28, 2025
  • 7
    Broadcom’s ‘harsh’ VMware contracts are costing customers up to 1,500% more
    • May 28, 2025
  • 8
    Pulsant targets partner diversity with new IaaS solution
    • May 23, 2025
  • 9
    Growing AI workloads are causing hybrid cloud headaches
    • May 23, 2025
  • Gemma 3n 10
    Announcing Gemma 3n preview: powerful, efficient, mobile-first AI
    • May 22, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • Understand how Windows Server 2025 PAYG licensing works
    • May 20, 2025
  • By the numbers: How upskilling fills the IT skills gap
    • May 21, 2025
  • 3
    Cloud adoption isn’t all it’s cut out to be as enterprises report growing dissatisfaction
    • May 15, 2025
  • 4
    Hybrid cloud is complicated – Red Hat’s new AI assistant wants to solve that
    • May 20, 2025
  • 5
    Google is getting serious on cloud sovereignty
    • May 22, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.