Following the trend for cloud solution providers to provide a one-stop platform for all data, Google Cloud has released new tools that enable enterprises not only to generate business insights but also to perform data engineering operations.
According to the company, one of the many challenges that enterprises face today is managing data across disparate lakes and warehouses, which creates silos and increases risk and cost, especially when data needs to be moved.
To address this challenge, the company has released a new tool, dubbed BigLake.
“BigLake allows companies to unify their data warehouses and lakes to analyse data without worrying about the underlying storage format or system, which eliminates the need to duplicate or move data from a source and reduces cost and inefficiencies,” said Gerrit Kazmaier, vice president of database, data analytics, and Looker at Google Cloud.
“With BigLake, customers gain access controls, with an API interface spanning Google Cloud and open file formats like Parquet, along with open-source processing engines like Apache Spark,” Kazmaier added.
According to Constellation Research’s Doug Henschen, Google is responding to the trend toward combined lake and warehouse (or “Lakehouse”) data platforms that promise to support analytics associated with SQL-based querying against warehouses as well as the data-science and data engineering associated with the semi-structured and unstructured information held in data lakes.
Previously, Google Cloud offered Big Query, a data warehouse service, and DataProc, a Hadoop/Spark-based data lake service, separately.
“Cloudera, Databricks, Microsoft, Oracle, Snowflake, and SAP all have combined lake/warehouse offerings. And Amazon Redshift Spectrum has long been aligned with AWS’ Lake Formation capability for building lakes based on S3 object storage,” Henschen said.
Henschen added that enterprises need to understand to what degree each of these offerings really satisfy their analytics and data science or data engineering requirements. “In general, the warehouse-rooted offerings cater more to analytics requirements and the lake-rooted offerings have better depth and functionality on the data science and data engineering side,” Henschen said.
BigLake, which is on preview, is now available for enterprises to try, Google said.
GCP introduces Change Data Capture
With the aim to make the latest data and datasets available to teams across an enterprise, Google Cloud has showcased a new Change Data Capture (CDC) feature.
Called Spanner Change Streams, the new tool will allow an enterprise to do real-time CDC (update, insert or delete data) for their Google Cloud Spanner database, Sudhir Hasbe, director of product management at Google Cloud, said.
According to Henschen, Spanner Change Streams will make it possible for enterprises to get change streams out of Google Cloud Spanner into other destinations to meet low-latency requirements in contrast to just supporting bringing change data from other databases into Spanner.
Easing machine learning operations
Google has been working to ease machine learning operations with the launch of the Vertex AI platform in May 2021, followed by the introduction of collaborative development environment Vertex AI Workbench in October.
“Vertex AI Workbench, which is now generally available, brings data and ML systems into a single interface so that teams have a common toolset across data analytics, data science, and machine learning. This capability enables teams to build, train, and deploy an ML model five times faster than the traditional notebooks,” said June Yang, vice president of Cloud AI and Industry Solutions at Google Cloud.
According to the company, the integrated development environment, which runs as a Google managed notebook service, can access data across multiple services such as Dataproc,BigQuery, Dataplex, and Looker.
In addition, the company released a new feature dubbed Vertex AI Model Registry, which is currently in select preview. The Model Registry is aimed at making it easier for enterprises to manage the overhead of ML model maintenance, Yang said, adding that the feature provides a central repository for discovering, using, and governing machine learning models including those in BigQuery ML.
According to Henschen, the new feature solves a critical problem for enterprises. “Registries help with model lifecycle management, a challenge that only gets tougher as the numbers of collaborators and the numbers of models grow. This helps data scientists, primarily, but also data engineers, the developers that put models into production and monitor and revise them as model performance degrades,” Henschen explained.
Amazon’s SageMaker and Azure’s Machine Learning Service already have this capability, the analyst said.
Looker gets two new features
New Looker features, Connected Sheets for Looker and the ability to access Looker data models within Data Studio, bolster and streamline Google Cloud’s analytics offerings, says Henshen.
“Customers now have the ability to interact with data whether it be through Looker Explore, or from Google Sheets, or using the drag-and-drop Data Studio interface. This will make it easier for everyone to access and unlock insights from data in order to drive innovation, and to make data-driven decisions with this new unified Google Cloud business intelligence platform,” Kazmaier said.
The Data Cloud Alliance and other partnerships
Google has formed a Data Cloud Alliance in partnership with Accenture, Confluent, Databricks, Dataiku, Deloitte, Elastic, Fivetran, MongoDB, Neo4j, Redis, and Starburst to make data more portable and accessible across disparate business systems, platforms, and environments.
Data Cloud Alliance members will provide infrastructure, APIs, and integration support to ensure data portability and accessibility between multiple platforms and products across multiple environments, the company said, adding that each member will also collaborate on new, common industry data models, processes, and platform integrations to increase data portability and reduce complexity associated with data governance and global compliance.
To help enterprises with migration of their databases, Google Cloud has partnered with system integrators and consulting firms such as TCS, Atos, Deloitte, HCL, Kyndryl, Infosys, Wipro, Capgemini, and Cognizant.
Other initiatives include the launch of Google Cloud Ready – BigQuery, a new validation program that recognises partner solutions like those from Fivetran, Informatica, and Tableau that meet a core set of functional and interoperability requirements.
“Today, we already recognise more than 25 partners in this new Google Cloud Ready – BigQuery program that reduces costs for customers associated with evaluating new tools while also adding support for new customer use cases,” Kazmaier said.