aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Data
  • Engineering
  • Solutions
  • Tools

Introducing Firehose: An Open Source Tool From Gojek For Seamless Data Ingestion To BigQuery And Cloud Storage

  • aster.cloud
  • July 4, 2022
  • 4 minute read

Indonesia’s largest hyperlocal company, Gojek has evolved from a motorcycle ride-hailing service into an on-demand mobile platform, providing a range of services that include transportation, logistics, food delivery, and payments. A total of 2 million driver-partners collectively cover an average distance of 16.5 million kilometers each day, making Gojek Indonesia’s de-facto transportation partner.

To continue supporting this growth, Gojek runs hundreds of microservices that communicate across multiple data centers. Applications are based on an event-driven architecture and produce billions of events every day. To empower data-driven decision-making, Gojek uses these events across products and services for analytics, machine learning, and more.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Data warehouse ingestion challenges

To make sense of large amounts of data — and to better understand customers for the purpose of app development, customer support, growth, and marketing purposes — data must first be ingested into a data warehouse. Gojek uses BigQuery as its primary data warehouse. But ingesting events at Gojek’s scale, with rapid changes, poses the following challenges:

  • With multiple products and microservices offered, Gojek releases new Kafka topics almost every day and they need to be ingested for analytical purposes. This can quickly result in significant operational overhead for the data engineering team that is deploying new jobs to load data into BigQuery and Cloud Storage.
  • Frequent schema changes in Kafka topics require consumers of those topics to load the new schema to avoid data loss and capture more recent changes.
  • Data volumes can vary and grow exponentially as people start building new products and logging new activities on top of a new topic. Each topic can also have a different load during peak business hours. Customers need to handle the rising volume of data to quickly scale per their business needs.
Read More  The 2022 NRF "Big Show" Validates Retail’s Need To Leverage Digital Connectivity And Data To Transform Business

Firehose and Google Cloud to the rescue

To solve these challenges, Gojek uses Firehose, a cloud-native service to deliver real-time streaming data to destinations like service endpoints, managed databases, data lakes, and data warehouses like Cloud Storage and BigQuery. Firehose is part of the Open Data Ops Foundation (ODPF), and is fully open source. Gojek is one of the major contributors to ODPF.

Here are Firehose’s key features:

  • Sinks – Firehose supports sinking stream data to the log console, HTTP, GRPC, PostgresDB (JDBC), InfluxDB, Elastic Search, Redis, Prometheus, MongoDB, GCS, and BigQuery.
  • Extensibility – Firehose allows users to add a custom sink with a clearly defined interface, or choose from existing sinks.
  • Scale – Firehose scales in an instant, both vertically and horizontally, for a high-performance streaming sink with zero data drops.
  • Runtime – Firehose can run inside containers or VMs in a fully-managed runtime environment like Kubernetes.
  • Metrics – Firehose always lets you know what’s going on with your deployment, with built-in monitoring of throughput, response times, errors, and more.

 

Key advantages

Using Firehose for ingesting data in BigQuery and Cloud Storage has multiple advantages.

Reliability
Firehose is battle-tested for large-scale data ingestion. At Gojek, Firehose streams 600 Kafka topics in BigQuery and 700 Kafka topics in Cloud Storage. On average, 6 billion events are ingested daily in BigQuery, resulting in more than 10 terabytes of daily data ingestion.

Streaming ingestion
A single Kafka topic can produce up to billions of records in a day. Depending on the nature of the business, scalability and data freshness are key to ensuring the usability of that data, regardless of the load. Firehose uses BigQuery streaming ingestion to load data in near-real-time. This allows analysts to query data within five minutes of it being produced.

Read More  How MEDITECH Adds Advanced Security To Its Cloud-Based Healthcare Solutions With Cloud IDS

Schema evolution
With multiple products and microservices offered, new Kafka topics are released almost every day, and the schema of Kafka topics constantly evolves as new data is produced. A common challenge is ensuring that as these topics evolve, their schema changes are adjusted in BigQuery tables and Cloud Storage. Firehose tracks schema changes by integrating with Stencil, a cloud-native schema registry, and automatically updates the schema of BigQuery tables without human intervention. This reduces data errors and saves developers hundreds of hours.

Elastic infrastructure
Firehose can be deployed on Kubernetes and runs as a stateless service. This allows Firehose to scale horizontally as data volumes vary.

Organizing data in cloud storage 
Firehose GCS Sink provides capabilities to store data based on specific timestamp information, allowing users to customize how their data is partitioned in Cloud Storage.

Supporting a wide range of open source software

Built for flexibility and reliability, Google Cloud products like BigQuery and Cloud Storage are made to support a multi-cloud architecture. Open source software like Firehose is just one of many examples that can help developers and engineers optimize productivity. Taken together, these tools can deliver a seamless data ingestion process, with less maintenance and better automation.

How you can contribute

Development of Firehose happens in the open on GitHub, and we are grateful to the community for contributing bug fixes and improvements. We would love to hear your feedback via GitHub discussions or Slack.

 

 

By: Ravi Suhag (VP, Engineering, Gojek)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • BigQuery;
  • Data Analytics
  • Data Warehouse
  • Firehouse
  • Gojek
  • Google Cloud
  • Google Cloud Storage
  • Indonesia
  • Open Source
You May Also Like
Getting things done makes her feel amazing
View Post
  • Computing
  • Data
  • Featured
  • Learning
  • Tech
  • Technology

Nurturing Minds in the Digital Revolution

  • April 25, 2025
View Post
  • Engineering
  • Technology

Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

  • March 9, 2025
View Post
  • Computing
  • Engineering

Why a decades old architecture decision is impeding the power of AI computing

  • February 19, 2025
View Post
  • Engineering
  • Software Engineering

This Month in Julia World

  • January 17, 2025
View Post
  • Engineering
  • Software Engineering

Google Summer of Code 2025 is here!

  • January 17, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025
View Post
  • Computing
  • Design
  • Engineering
  • Technology

Here’s why it’s important to build long-term cryptographic resilience

  • December 24, 2024
IBM and Ferrari Premium Partner
View Post
  • Data
  • Engineering

IBM Selected as Official Fan Engagement and Data Analytics Partner for Scuderia Ferrari HP

  • November 7, 2024

Stay Connected!
LATEST
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 6, 2025
  • 3
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • Getting things done makes her feel amazing 4
    Nurturing Minds in the Digital Revolution
    • April 25, 2025
  • 5
    AI is automating our jobs – but values need to change if we are to be liberated by it
    • April 17, 2025
  • 6
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 7
    United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services
    • April 15, 2025
  • 8
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 9
    IBM Accelerates Momentum in the as a Service Space with Growing Portfolio of Tools Simplifying Infrastructure Management
    • March 27, 2025
  • 10
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • 2
    Co-op mode: New partners driving the future of gaming with AI
    • March 22, 2025
  • 3
    Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience
    • March 10, 2025
  • PiPiPi 4
    The Unexpected Pi-Fect Deals This March 14
    • March 13, 2025
  • Nintendo Switch Deals on Amazon 5
    10 Physical Nintendo Switch Game Deals on MAR10 Day!
    • March 9, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.