We are announcing that Looker’s new Google Cloud operators for Apache Airflow are available in Cloud Composer, Google Cloud’s fully managed service for orchestrating workflows across cloud, hybrid, and multi-cloud environments. This integration gives users the ability to orchestrate Looker persistent derived tables (PDTs) alongside the rest of their data pipeline.
Looker PDTs are the materialized results of a query, written to a Looker scratch schema in the connected database and rebuilt on a defined schedule. Because they are defined within LookML, PDTs reduce friction and speed up time to value by putting the power to create robust data transformations in the hands of data modelers. But administration of these transformations can be difficult to scale. By leveraging this new integration, customers can now get greater visibility into and exercise more granular control over their data transformations.
Using Looker with Cloud Composer enables customers to:
- Know exactly when PDTs are going to rebuild by directly linking PDT regeneration jobs to the completion of other data transformation jobs. This insight ensures that PDTs are always up to date without using Looker datagroups to repeatedly query for changes in the underlying data and enables admins to closely control job timing and resource consumption.
- Automatically kick off other tasks that leverage data from PDTs, like piping transformed data into a machine learning model or delivering transformed data to another tool or file store.
- Quickly get alerted of errors that occur for more proactive troubleshooting and issue resolution.
- Save time and resources by quickly identifying any points of failure within a chain of cascading PDTs and restarting the build process from there rather than from the beginning. Within Looker, there are only options to rebuild a specific PDT or to rebuild the entire chain.
- Easily pick up any changes in your underlying database by forcing incremental PDTs to reload in full on a schedule or on an ad-hoc basis with the click of a button.
Pairing Looker with Cloud Composer provides customers with a pathway for accomplishing key tasks like these, making it easier to manage and scale PDT usage.
There are two new Looker operators available that can be used to manage PDT builds using Cloud Composer:
- LookerStartPdtBuildOperator: initiates materialization for a PDT based on a specified model name and view name and returns the materialization ID.
- LookerCheckPdtBuildSensor: checks the status of a PDT build based on a provided materialization ID for the PDT build job.
These operators can be used in Cloud Composer to create tasks inside of a Directed Acyclic Graph, or DAG, with each task representing a specific PDT build. These tasks can be organized based on relationships and dependencies across different PDTs and other data transformation jobs.
You can start using Looker and Cloud Composer together in a few steps:
- Within your connection settings in your Looker instance, turn on the Enable PDT API Control toggle. Make sure that this setting is enabled for any connection with PDTs that you’d like to manage using Cloud Composer.
- Set up a Looker connection in Cloud Composer. This connection can be done through Airflow directly, but for production use, we’d recommend that you use Cloud Composer’s Secret Manager.
- Create a DAG using Cloud Composer.
- Add tasks into your DAG for PDT builds.
- Define dependencies between tasks within your DAG.
To learn more about how to externally orchestrate your Looker data transformations, see this tutorial in the Looker Community.
Data Transformations at Scale
This integration between Looker and Cloud Composer pairs the speed and agility of PDTs with the added scalability and governance of Cloud Composer. By managing these Looker data transformations using Cloud Composer, customers can:
- Define and manage build schedules to help ensure that resourcing is allocated efficiently across all ongoing processes
- See the jobs that are running, have errored, or have completed, including Looker data transformations, in one place
- Leverage the output of a PDT within other automated data transformations taking place outside of Looker
Thanks to this integration with Cloud Composer, Looker is giving customers the ability to empower modelers and analysts to transform data at speed, while also tapping into a scalable governance model for transformation management and maintenance. Looker operators for Cloud Composer are generally available to customers using an Airflow 2 environment. For more information, check out the Cloud Composer documentation or read this tutorial on setting up Looker with Apache Airflow.
Acknowledgements: Aleks Flexo, Product Manager
By: Aleksei Loginov (Software Engineer) and Maire Newton (Outbound Product Manager, Google Cloud)
Source: Google Cloud Blog