Databricks is acquiring AI-centric data governance platform provider Okera for an undisclosed sum, the data lakehouse provider said on Wednesday.
The acquisition is expected to boost Databricks’ data governance capabilities while training and managing large language models (LLMs), such as the recently released Dolly 2.0, the company said.
“Okera solves data privacy and governance challenges across the spectrum of data and AI. It simplifies data visibility and transparency, helping organisations understand their data, which is essential in the age of LLMs and to address concerns about their biases,” Databricks said in a blog post.
The company believes that an AI-based approach is needed in data governance when it comes to LLMs or generative AI as the size of data increases manifold and other concerns such as bias “fall outside the reach of traditional data governance platforms.”
What Okera’s governance capabilities can do?
The governance platform from Okera includes an AI interface that automatically discovers, classifies, and tags sensitive data, such as personally identifiable information.
“These tags enable data governance stakeholders to easily assess compliance and create no-code access policies that improve visibility and control over data,” Databricks said in the blog post.
Okera also provides a self-service portal to quickly audit and analyse sensitive data usage, giving organisations the ability to reliably monitor and track data usage patterns even when datasets increase in size exponentially or some of them are generated by AI engines, the company added.
Okera is also working on developing a new isolation technology that can support arbitrary workloads while enforcing governance control without sacrificing performance, Databricks said.
“This technology is in private preview and has been tested by a number of joint customers specifically on their AI workloads. It is the key to ensure enterprises will be covering the whole spectrum of applications in the new world efficiently,” the company added.
Databricks to integrate Okera’s capabilities with Unity Catalog
Post the acquisition, Databricks intends to integrate Okera’s capabilities with its own data governance layer inside its lakehouse offering, dubbed Unity Catalog, within the next year.
“Our customers will benefit from being able to use AI to discover, classify and govern all their data, analytics, and AI assets (including ML models and model features) with attribute-based and intent-based access policies,” Databricks said.
The self-service portal from Okera will help enterprises with end-to-end data observability, including tracing data lineage and usage of sensitive data, on the entire lakehouse, the company added.
Databricks said the combination of these capabilities will enable enterprises to use a single permission model to define access policies across their lakehouse or data estate.
“This forthcoming acquisition will also enable us to expose APIs for richer policies that other data governance partners can use, providing seamless solutions for our customers,” the company added.
San Francisco-headquartered Okera, which was founded in 2016 by Amandeep Khurana and Nong Li, has raised over $29 million in funding from investors such as Bessemer Venture Partners, Alumni Ventures, and Felicis.
Nong Li, Okera’s co-founder and CEO, is widely known for creating Apache Parquet, the open source standard storage format that Databricks and others work on.
Nong played an instrumental role previous role at Databricks when he led the vectorised Parquet effort and the code generation effort that resulted in Apache Spark 2.0’s performance improvement, the company said.