The Linux Foundation Projects
Skip to main content

Discover LF AI & Data Projects with TAC Talks Watch Now

We are excited to announce that Unity Catalog held its first meetup since becoming a sandbox project under LF AI & Data. This initial meetup saw over 50 community members in attendance. Moving forward, we will host meetups every two weeks to keep the community updated on our progress and developments.

 

Meetup Highlights

The agenda for this initial meeting included:

  • An overview of Unity Catalog
  • Show and tell from Daft
  • Laying out the priorities ahead
  • Community Q&A

Here are some key takeaways:

What Makes a Great Catalog?

A great catalog enables you to:

  • Manage data and AI assets in one place: Consolidate all your data and AI assets, simplifying management and access.
  • Govern assets through a single source of truth: Maintain consistent and reliable governance across all assets.
  • Leverage best-of-breed tools with your data: Integrate seamlessly with leading tools and technologies.

Unity Catalog was designed to be the best catalog for modern data and AI workloads, leveraging our years of experience building for customers at the forefront of the industry.

Core Capabilities of Unity Catalog

Catalogs are the source of truth for critical properties about your assets. They manage metadata for the data assets in your lakehouse, providing discovery, governance, and lineage. Here are some vital aspects:

  • Table Metadata for Lakehouse Formats: Manage schema, table names, and other metadata.
  • Governance and Access Control: Implement robust access controls and governance policies.
  • Standardized Metadata for AI Objects: Ensure consistent metadata standards for your AI assets.
  • Advanced Features: Support for multi-table transactions and other advanced capabilities.

Challenges with Current Cloud Data Platforms

Today’s cloud data platforms face several challenges:

  1. Lack of Open Access: Many platforms need open access to data and metadata, leading to vendor lock-in and expensive compute requirements.
  2. Arbitrary Siloing of Data and AI Assets: Siloed data and AI assets result in duplicate work and poor performance.
  3. Inconsistent and Difficult Governance: Existing open catalogs often need more robust governance support, leading to over- or under-permissions and governance challenges.

Our Vision: Open, Interoperable, and Unified

Unity Catalog aims to address these challenges with:

  • Open APIs and OSS Server: Maximize flexibility and customer choice.
  • Interoperable Interface: Support any format, engine, data, and AI asset.
  • Unified Governance: Consistent governance across tabular non-tabular data, and AI assets.

Available Today

Upcoming Developments

We have several exciting developments in the pipeline:

  • Engine Client Integrations: Including Daft, DuckDB, CelerData / StarRocks, PuppyGraph.
  • API v0.1 SDK Release: Adding more tests, fixing issues, and creating starter issues for community contributors.
  • Improved Deployability: Building docker images and maven packages.
  • Documentation: mkdocs PR has been merged, which is improving our documentation.
  • UI Collaboration: Working with open-source catalog UI projects to build out UI clients.

Get Involved

We invite you to collaborate with us and contribute to the project. Visit our website and GitHub repository for more information:

Thank you to everyone who attended the first meetup. We look forward to seeing you at the next one and working together to build the best catalog for modern data and AI workloads.