BigQuery Stored Procedures for Apache Spark now entering Preview
In the past, customers have found it hard to manage their data across both data warehouses and data lakes. Earlier this year, we announced BigLake, a storage engine that enables customers to store data in open file formats (such as Parquet) on Google Cloud Storage, and run GCP and open source engines on it in a secure, governed and performant manner. Today, BigQuery opens the next chapter in that story by unifying data warehouse and data lake processing by embedding the Apache Spark engine directly into BigQuery.
From our partners:
With BigQuery stored procedures for Apache Spark, you can run Apache Spark programs from BigQuery, unifying your advanced transformation and ingestion pipelines as BigQuery processes. With a stored procedure, you can schedule Apache Spark as a step in a set of SQL statements, mixing and matching the unstructured data lake objects with structured SQL queries. You can also hand off the procedures to others so they can execute Apache Spark jobs directly from SQL, so they can retrain models or ingest complicated data structures without having to understand the underlying Apache Spark code.
The cost of running these Apache Spark jobs is only based on job duration and resources consumed. The costs are converted to either BigQuery bytes processed or BigQuery slots, giving you a single billing unit for both your data lake and data warehouse jobs.
Google Colab Integration with BigQuery Console now entering Preview
For years, BigQuery customers have found Colab to be a delightful notebook-based programming experience for extending their BigQuery SQL with Python-based analysis. Customers have asked us to make it easier to move between BigQuery SQL and a Colab notebook to improve their data workflows, so that is just what we have done. Now, in Preview, a customer can jump immediately from the result of a SQL query into a notebook to do further analysis in Python, as shown in the image below. This lets you move quickly into running descriptive statistics, generating visualizations, creating a predictive analysis, or even sharing your results with others.
Remote Functions now GA
We had requests from healthcare providers who wanted to bring their existing security platforms to BigQuery, financial institutions that wanted to enrich their BigQuery data with real time stock updates, and data scientists who wanted to be able to use Vertex AI alongside BQML. To help these customers extend BigQuery into these other components, we are now making BigQuery Remote Functions Generally Available.
Protegrity and CyberRes have already developed integrations with these remote functions as a mechanism to merge BigQuery into their security platform, which will help our mutual customers address stringent compliance controls.
By: Christopher Crosbie (Product Manager, Data Analytics) and Joe Malone (Product Manager, Data Analytics)
Source: Google Cloud Blog
For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!
Our humans need coffee too! Your support is highly appreciated, thank you!