We’re excited to announce that Datastream, Google Cloud’s serverless change data capture (CDC) and replication service, is now generally available. Datastream allows you to synchronize data across disparate databases, storage systems, and applications reliably and with minimal latency to support real-time analytics, database replication, and event-driven architectures. You can easily and seamlessly deliver change streams from Oracle and MySQL databases into Google Cloud services such as BigQuery, Cloud SQL, Google Cloud Storage and Cloud Spanner, saving time and resources and ensuring your data is accurate and up to date. Get started with Datastream today.
Since our public preview launch earlier this year, we’ve seen Datastream used across a variety of industries, by customers such as Chess.com, Cogeco, Schnuck Markets, and MuchBetter. This early adoption strengthens the message we’ve been hearing from customers about the demand for change data capture to provide replication and streaming capabilities for real-time analytics and business operations.
MuchBetter is a multi-award-winning e-wallet app, providing a truly secure and enjoyable banking alternative for customers all over the world. Working with Google Cloud Premier Partner Datatonic, they’re leveraging Datastream to replicate real-time data from MySQL OLTP databases into a BigQuery data warehouse to power their analytics needs. According to Andrew McBrearty, Head of Technology at MuchBetter, “from MuchBetter’s point of view, leveraging Dataflow, BigQuery and Looker has unlocked additional insights from our ever-increasing operational data. Using Datastream in our solution ensured continued real-time capability – we now have trend analysis in place, improved efficiency across the business, and the ability to use our data to derive actionable insights and to make data-driven decisions. This means we can continue to grow and adapt at a pace our customers have come to expect from MuchBetter. And for the first time, the world of ML and AI is open to us.”
Getting to know Datastream
Google Cloud customers are choosing Datastream for real-time change data capture because of its differentiated approach:
Real-time replication of change data shouldn’t be complicated: database preparation documentation, secure connectivity setup, and stream validation should be built right into the flow. Datastream delivers on this experience, as MuchBetter discovered during their evaluation of the product. “Datastream’s ease-of-use and immediate availability (serverless) meant we could start our evaluation and immediately see results”, says Mark Venables, Principal Data Engineer at MuchBetter. “For us, this meant getting rid of the considerable pre-work needed to align proof of concept tests with third-party CDC suppliers.”
Building pipelines to replicate changes from your source database shouldn’t take up all of your team’s time. Use pre-built Dataflow templates to easily replicate data into BigQuery, Cloud Spanner or Cloud SQL. Out of the box, these Dataflow templates will automatically create the tables and update the data at the destination, taking care of any out-of-order or duplicate events, and providing error resolution capabilities. Leverage the templates’ flexibility to fine-tune Dataflow to fit your specific needs. “Google-managed Dataflow templates meant getting our pipelines up and running with minimal effort and fuss – this allowed more time to be spent on more complex pipeline development whilst tactically delivering solutions to our users,” says Venables.
Datastream keeps your migrated data secure, supporting private connectivity between source and destination databases. “Establishing connectivity is often viewed as hard. Datastream surprised us with its ease of use & setup, even in more secure modes,” says Grzegorz Dlugolecki, Principal Cloud Architect at Chess.com, a leading online chess community and mobile application, hosting more than ten million chess games every day. “Datastream’s private connectivity configuration allowed us to easily create a private connection between our source and the destination, and ensure our data is safe and secure.”
High throughput, low latency
With Datastream’s serverless architecture, you don’t need to worry about provisioning, managing machines, or scaling up resources to meet fluctuations in data throughput. Datastream guarantees high performance – a single stream can process 10’s of MBs per second, while ensuring minimal latency. “We evaluated several market-leading ETL solutions”, says Dlugolecki, “Datastream was the only tool able to successfully sync our complex, single-table datasets, doing this in weeks instead of years estimated by the other vendors.”
Getting started with Datastream
You can start streaming real-time changes from your Oracle and MySQL databases today using Datastream:
- Navigate to the Datastream area of your Google Cloud console, under Big Data, and click Create Stream.
- Choose the source database type, and see what actions you need to take to set up your source.
- Create your source connection profile, which can later be used for additional streams.
- Define how you want to connect your source.
- Create and configure your destination connection profile.
- Validate your stream and make sure the test was successful. Start the stream when you’re ready.
Once the stream is started, Datastream will backfill historical data and will continuously replicate new changes as they happen.
Learn more and start using Datastream today
Datastream is now generally available for Oracle and MySQL sources. Datastream supports sources both on-premises and in the cloud, and captures historical data and changes into Cloud Storage. Integrations with Cloud Data Fusion and Cloud Dataflow (our data integration and stream processing products, respectively) replicate changes to other Google Cloud destinations, including: BigQuery, Cloud Spanner, and Cloud SQL.
By: Etai Margolin (Product Manager)
Source: Google Cloud Blog