The Linux Foundation Projects
Skip to main content

Discover LF AI & Data Projects with TAC Talks Watch Now

LF AI & Data Blog

Marquez Joins LF AI as New Incubation Project

By June 16, 2020No Comments

The LF AI Foundation (LF AI), the organization building an ecosystem to sustain open source innovation in artificial intelligence (AI), machine learning (ML), and deep learning (DL), today is announcing Marquez as its latest Incubation Project. Marquez is an open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata. It maintains the provenance of how datasets are consumed and produced, provides global visibility into job runtime and frequency of dataset access, centralization of dataset lifecycle management, and much more.

“The Marquez community is excited to join the LF AI. This is the next step for Marquez to become an integral part of the wider data community and be the standard for lineage and metadata collection” said Julien Le Dem, CTO of Datakin. “We are very pleased to welcome Marquez to LF AI. Machine learning requires high quality data pipelines and Marquez gives visibility into data quality, enables reproducibility, facilitates operations, and builds accountability and trust,” said Dr. Ibrahim Haddad, Executive Director of LF AI. “We look forward to supporting this project and helping it to thrive under a neutral, vendor-free, and open governance.” LF AI supports projects via a wide range of benefits; and the first step is joining as an Incubation Project. Full details on why you should host your open source project with LF AI are available here.

Marquez enables highly flexible data lineage queries across all datasets, while reliably and efficiently associating (upstream, downstream) dependencies between jobs and the datasets they produce and consume.

Marquez is a modular system and has been designed as a highly scalable, highly extensible platform-agnostic solution for metadata management. It consists of the following system components:

  • Metadata Repository: Stores all job and dataset metadata, including a complete history of job runs and job-level statistics (i.e. total runs, average runtimes, success/failures, etc).
  • Metadata API: RESTful API enabling a diverse set of clients to begin collecting metadata around dataset production and consumption.
  • Metadata UI: Used for dataset discovery, connecting multiple datasets and exploring their dependency graph.

Marquez’s data model emphasizes immutability and timely processing of datasets. Datasets are first-class values produced by job runs. A job run is linked to versioned code, and produces one or more immutable versioned outputs. Dataset changes are recorded at different points in job execution via lightweight API calls, including the success or failure of the run itself.

Learn more about Marquez here and be sure to join the Marquez-Announce and Marquez-Technical-Discuss mail lists to join the community and stay connected on the latest updates. 

A warm welcome to Marquez and we look forward to the project’s continued growth and success as part of the LF AI Foundation. To learn about how to host an open source project with us, visit the LF AI website.

Marquez Key Links

LF AI Resources

Author

  • Andrew Bringaze

    Andrew Bringaze is the senior developer for The Linux Foundation. With over 10 years of experience his focus is on open source code, WordPress, React, and site security.

    View all posts