aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Data
  • Engineering
  • Solutions

Meet Optimus, Gojek’s Open-Source Cloud Data Transformation Tool

  • aster_cloud
  • October 21, 2022
  • 4 minute read
Editor’s note: Earlier this year, we heard from Gojek, the ​​on-demand services platform, about the open-source data ingestion tool it developed for use with data warehouses like BigQuery. Today, Gojek VP of Engineering Ravi Suhag is back to discuss the open-source data transformation tool it is building.


In a recent post, we introduced Firehose, an open source solution by Gojek for ingesting data to cloud data warehouses like Cloud Storage and BigQuery. Today, we take a look at another project within the data transformation and data processing flow.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

As Indonesia’s largest hyperlocal on-demand services platform, Gojek has diverse data needs across transportation, logistics, food delivery, and payments processing. We also run hundreds of microservices across billions of application events. While Firehose solved our need for smarter data ingestion across different use cases, our data transformation tool, Optimus, ensures the data is ready to be accessed with precision wherever it is needed.

The challenges in implementing simplicity

At Gojek, we run our data warehousing across a large number of data layers within BigQuery to standardize and model data that’s on its way to being ready for use across our apps and services.

Gojek’s data warehouse has thousands of BigQuery tables. More than 100 analytics engineers run nearly 4,000 jobs on a daily basis to transform data across these tables. These transformation jobs process more than 1 petabyte of data every day.

Apart from the transformation of data within BigQuery tables, teams also regularly export the cleaned data to other storage locations to unlock features across various apps and services.

Read More  Unveiling The 2021 Google Cloud Partner Of The Year Award Winners

This process addresses a number of challenges:

  • Complex workflows: The large number of BigQuery tables and hundreds of analytics engineers writing transformation jobs simultaneously creates a huge dependency on very complex database availability groups (DAGs) to be scheduled and processed reliably.
  • Support for different programming languages: Data transformation tools must ensure standardization of inputs and job configurations, but they must also comfortably support the needs of all data users. They cannot, for instance, limit users to only a single programming language.
  • Difficult to use transformation tools: Some transformation tools are hard to use for anyone that’s not a data warehouse engineer. Having easy-to-use tools helps remove bottlenecks and ensure that every data user can produce their own analytical tables.
  • Integrating changes to data governance rules: Decentralizing access to transformation tools requires strict adherence to data governance rules. The transformation tool needs to ensure columns and tables have personally identifiable information (PII) and non-PII data classifications correctly inserted, across a high volume of tables.
  • Time-consuming manual feature updates: New requirements for data extraction and transformation for use in new applications and storage locations are part of Gojek’s operational routine. We need to design a data transformation tool that could be updated and extended with minimal development time and disruption to existing use cases.

Enabling reliable data transformation on data warehouses like BigQuery

With Optimus, Gojek created an easy-to-use and reliable performance workflow orchestrator for data transformation, data modeling, data pipelines, and data quality management. If you’re using BigQuery as your data warehouse, Optimus makes data transformation more accessible for your analysts and engineers. This is made possible through simple SQL queries and YAML configurations, with Optimus handling many key demands including dependency management, and scheduling data transformation jobs to run at scale.

Read More  The Top Three Insights We Learned From Data Analytics Customers In 2021
Key features include:

  • Command line interface (CLI): The Optimus command line tool offers effective access to services and job specifications. Users can create, run, and replay jobs, dump a compiled specification for a scheduler, create resource specifications for data stores, add hooks to existing jobs, and more.
  • Optimized scheduling: Optimus offers an easy way to schedule SQL transformation through YAML based configuration. While it recommends Airflow by default, it is extensible enough to support other schedulers that can execute Docker containers.
  • Dependency resolution and dry runs: Optimus parses data transformation queries and builds dependency graphs automatically. Deployment queries are given a dry-run to ensure they pass basic sanity checks.
  • Powerful templating: Users can write complex transformation logic with compile time template options for variables, loops, IF statements, macros, and more.
  • Cross-tenant dependency: With more than two tenants registered, Optimus can resolve cross-tenant dependencies automatically.
  • Built-in hooks: If you need to sink a BigQuery table to Kafka, Optimus can make it happen thanks to hooks for post-transformation logic that extend the functionality of your transformations.
  • Extensibility with plugins: By focusing on the building blocks, Optimus leaves governance for how to execute a transformation to its plugin system. Each plugin features an adapter and a Docker image, and Optimus supports Python transformation for easy custom plugin development.

Key advantages of Optimus

Like Google Cloud, Gojek is all about flexibility and agility, so we love to see open source software like Optimus helping users take full advantage of multi-tenancy solutions to meet their specific needs.

Through a variety of configuration options and a robust CLI, Optimus ensures that data transformation remains fast and focused by preparing SQL correctly. Optimus handles all scheduling, dependencies, and table creation. With the capability to build custom features quickly based on new needs through Optimus plugins, you can explore more possibilities. Errors are also minimized with a configurable alert system that flags job failures immediately. Whether to email or Slack, you can trigger alerts based on specific requirements – from point of failure to warnings based on SLA requirements.

Read More  What's New In Google Cloud Databases: More Unified. More Open. More Intelligent.

How you can contribute

With Firehose and Optimus working in tandem with Google Cloud, Gojek is helping pave the way in building tools that enable data users and engineers to achieve fast results in complex data environments.

Optimus is developed and maintained at Github and uses Requests for Comments (RFCs) to communicate ideas for its ongoing development. The team is always keen to receive bug reports, feature requests, assistance with documentation, and general discussion as part of its Slack community.

 

 

By: Ravi Suhag (VP, Engineering, Gojek)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster_cloud

Related Topics
  • Data Analytics
  • Gojek
  • Google Cloud
  • Optimus
  • SQL
You May Also Like
Web
View Post
  • Engineering
  • Software Engineering

Mastering the Art of Load Testing for Web Applications

  • November 29, 2023
Data center. Servers.
View Post
  • Data
  • Platforms
  • Software

Intel Granulate Optimizes Databricks Data Management Operations

  • November 27, 2023
Ubuntu. Chiselled containers.
View Post
  • Engineering
  • Technology

Canonical Announces The General Availability Of Chiselled Ubuntu Containers

  • November 25, 2023
Brush, Color, and Sketch pad
View Post
  • Cloud-Native
  • Design
  • Engineering

6 Security Best Practices For Cloud-Native Applications

  • November 17, 2023
Ingrasys
View Post
  • Computing
  • Engineering
  • Technology

Ingrasys Unveils Next-Gen AI And Cooling Solutions At Supercomputing 2023

  • November 15, 2023
Malware, Security, and Laptop
View Post
  • Engineering
  • Technology

Singapore And Google Partner On Web Risk To Protect Citizens From Online Scams And Phishing

  • November 12, 2023
View Post
  • Engineering
  • Public Cloud

Golang’s GORM Support For Cloud Spanner Is Now Generally Available

  • November 9, 2023
Cloud
View Post
  • Design
  • Engineering
  • Public Cloud

The Impact Of Public Cloud Price Hikes

  • November 8, 2023

Stay Connected!
LATEST
  • Web 1
    Mastering the Art of Load Testing for Web Applications
    • November 29, 2023
  • Data center. Servers. 2
    Intel Granulate Optimizes Databricks Data Management Operations
    • November 27, 2023
  • Ubuntu. Chiselled containers. 3
    Canonical Announces The General Availability Of Chiselled Ubuntu Containers
    • November 25, 2023
  • Cyber Monday Sale. Guzz. Ideals collection. 4
    Decode Workweek Style with guzz
    • November 23, 2023
  • Guzz. Black Friday Specials. 5
    Art Meets Algorithm In Our Exclusive Shirt Collection!
    • November 23, 2023
  • Presents. Gifts. 6
    25 Besties Bargain Bags Below $100 This Black Friday 2023
    • November 22, 2023
  • Electronics 7
    Top 10+1 You Can’t Do Without For The Holidays: Electronics Edition.
    • November 20, 2023
  • Microsoft. Windows 8
    Ousted Sam Altman To Lead New Microsoft AI Team
    • November 20, 2023
  • Sale. Deals. Discount. 9
    The 50 Best Electronic Deals To Get On Amazon Before Cyber Monday 2023
    • November 20, 2023
  • Portrait of Rosalynn Carter, 1993 10
    Former First Lady Rosalynn Carter Passes Away at Age 96
    • November 19, 2023
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • Oracle | Microsoft 1
    Oracle Cloud Infrastructure Utilized by Microsoft for Bing Conversational Search
    • November 7, 2023
  • Riyadh Air and IBM 2
    Riyadh Air And IBM Sign Collaboration Agreement To Establish Technology Foundation Of The Digitally Led Airline
    • November 6, 2023
  • Ingrasys 3
    Ingrasys Unveils Next-Gen AI And Cooling Solutions At Supercomputing 2023
    • November 15, 2023
  • Cloud 4
    DigitalOcean Currents Report Finds That Adoption Of AI/ML, And Investments In Cybersecurity And Multi-Cloud Strategies Are On The Rise At Small Businesses
    • November 9, 2023
  • OpenAI 5
    OpenAI Announces Leadership Transition
    • November 18, 2023
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.