Google Cloud has always taken a different approach than its hyperscaler peers. Google Cloud was into open source long before it became cool and was similarly early in forging partnerships to care for customer needs.
Even so, Google Cloud’s earnest efforts to use openness as a strategic lever to compete and collaborate are increasingly paying dividends, reaching a crescendo this week at the Google Next 2022 event.
In an interview with Gerrit Kazmaier, vice president and general manager of data analytics at Google Cloud, he stressed the importance of “open” (open source, open standards, open data) throughout, mentioning it 29 times in our conversation (yes, I counted).
Add this to the 100+ times the word “open” was used in the blog posts, press releases, etc., that Google Cloud released today and Google Cloud’s overarching message is clear:
Google Cloud wants to be the open cloud.
This sounds super fluffy but could prove substantive in practice. Not to mention it’s extraordinarily difficult to pull off, requiring a unique approach to thinking about product and data ownership.
Making open pay
As I’ve written, Google Cloud’s embrace of open source helped it gain a strong foothold against Amazon Web Services (AWS), which was first to the cloud market. In Google Cloud’s case, I suggested that although open source contributor counts definitely don’t guarantee success, they can play a part in long-term, customer-obsessed strategies and help reshape markets.
That’s the bet, and it does seem to be working.
At Next, Google Cloud took additional steps to position itself as the “most open data cloud ecosystem” by “unifying all data, from all sources, across any platform.”
The words “all” and “any” suggest some hyperbole, but let’s not diminish how the Next announcements get Google Cloud to the point that it’s even remotely credible enough to make that statement:
- Added support for major data formats such as Apache Iceberg, Linux Foundation Delta Lake, and Apache Hudi
- Introduced a new, integrated experience in BigQuery for Apache Spark
- Expanded or introduced integrations with popular enterprise data platforms such as Collibra, Elastic, MongoDB (disclosure: I work for MongoDB), and others.
Given Elastic’s interactions with other clouds, the Elastic partnership may be particularly interesting, as it’s a two-way integration: Google is making it easier for customers to federate their Elasticsearch queries to their data lakes on Google Cloud while extending Looker support into the Elastic platform.
I asked David Meyer, senior vice president of product management at Databricks, about the Delta Lake integration, given that Delta Lake was developed by Databricks, and both Databricks and Google Cloud compete for data warehousing workloads.
It comes down to customers, Meyer says. “Our customers said we need to be on Google.” Why? Well, larger Fortune 1000 companies “need some diversity in cloud from a leverage perspective,” Meyer says, “but also from a data estate perspective.”
These companies already tend to use Google Ads, so adding Google Cloud makes a lot of sense as they expand their cloud footprints. This becomes easier if they can keep their data in Delta Lake. Through this partnership and Google Cloud’s support for the Delta Lake format, customers can apply BigQuery to data sitting in their Delta data lakes without having to move it.
Google Cloud also announced some housekeeping matters (e.g., moving all of its business intelligence services under the Looker brand), but the vendor went well beyond housekeeping by deeply integrating Looker and Google Workspace to make BI-driven insights available in the familiar productivity tools (Google Sheets) customers will use day to day.
This isn’t “open” in the sense of open source, but it is open in the sense of lowering barriers to making use of data.
Other clouds have done this by making it easier to use, for example, MySQL or Linux. Google Cloud offers such things as well but goes one step further by making data easier to use, not just data infrastructure. Google’s introduction of Vertex AI Vision is similar: It makes otherwise complicated computer vision and image recognition AI more accessible to data practitioners.
Google Cloud may well employ a horde of PhDs, but thanks to new initiatives like these, you may not have to. This is good because, as Kazmaier emphasised, enterprise data estates are only going to grow in complexity.
Opening up data everywhere
Regardless of how much companies may claim to be “all in” on a single cloud, the messy reality is that they aren’t, or are rarely so. CIOs can attempt to play Whac-A-Mole with application creep across multiple clouds, including on-premises infrastructure, but “data is spread across multiple clouds for the vast majority of companies,” notes Kazmaier.
Multi-cloud, therefore, isn’t about deploying the same solution on multiple clouds and having independent silos of the same technology. Rather, Kazmaier concludes, “It’s about interconnecting the data of multiple clouds into a holistic data landscape.”
This is the vision driving Google Cloud’s embrace of multi-cloud, with Anthos and other technologies supporting it. It’s also why the company announced this week that it’s now possible to analyse unstructured streaming data in BigQuery, enabling enterprises to combine the analysis of structured and unstructured data in one place (called BigLake).
That’s a huge, extraordinarily difficult problem to solve. Google isn’t announcing a “data cloud” but really meaning “a data warehouse that happens to pull data from the cloud.”
No, it’s talking about the seamless ability to analyse operational data from databases like MongoDB in perfect tandem with data warehousing/analytics services and AI/ML activation systems.
It’s ambitious. It’s impressive. But it’s also exceptionally hard to pull off in practice because it requires Google Cloud to think beyond Google Cloud when devising customer-centric products.
The only way to make it work is to stop thinking in terms of absolute ownership of the customer experience and associated data. No one disputes that Google Cloud has originated some innovative open code in the industry: Kubernetes, TensorFlow, etc.
But what Google Cloud and, I’d argue, each of the clouds needs to do now is not be the originator of everything. The cloud is too big for any one vendor, no matter how big. No hyperscaler is ever “hyper” enough to be able to craft solutions to meet every need.
So far, Google Cloud seems to agree.
As mentioned, the vendor has always been partner oriented, but at this year’s Next event, the company has added more substance. To enable a deeply integrated partner ecosystem, Kazmaier says, Google Cloud needs to “have 100% open APIs,” but that’s not enough.
“It also means that the APIs we are using in our first-party products are the same APIs that we expose to our partners.” Yes, there will be edge cases where this isn’t possible, but those are exceptions, not the rule.
For Google Cloud, Kazmaier went on, many workloads are “best served by a partner, and our strategy is to open up our APIs so they can build. We don’t consider ourselves competing with them.”
If this sounds like a different approach, not just in the cloud, but in enterprise computing generally, it is. But it’s very much in keeping with Google’s core principles.
Perhaps it’s the natural conclusion if we start from the realisation that data continues to explode and not in any one particular place. If we assume that data will grow across clouds, that it will be both structured and unstructured, that it will require real-time and batch-oriented analysis, and that it will be complex in a myriad of other ways, then Google Cloud’s approach starts to seem inevitable.
In such a complex world awash in data, “an open platform will be the best choice for customers because it will ultimately offer them the greatest flexibility, the greatest degree of choice between multiple solutions, and the shortest time to value than anything else,” concludes Kazmaier.
It’s hard to argue with that logic.