Marquez is an open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata. It maintains the provenance of how datasets are consumed and produced, provides global visibility into job runtime and frequency of dataset access, centralization of dataset lifecycle management, and much more. Marquez enables highly flexible data lineage queries across all datasets, while reliably and efficiently associating (upstream, downstream) dependencies between jobs and the datasets they produce and consume. Marquez is a modular system and has been designed as a highly scalable, highly extensible platform-agnostic solution for metadata management. Marquez’s data model emphasizes immutability and timely processing of datasets.
Marquez is a graduation-stage project of the LF AI & Data Foundation.
Marquez is initially contributed by WeWork in December 2019 as incubation-stage project and graduated in September 2023.