From Theory to Practice: How MLOps Can Embed GenAI Transparency

Author: Vincent Caldeira, CTO of Red Hat in APAC

This post concludes our three-part series on operationalizing GenAI transparency. Now that we’ve covered transparency frameworks and licensing, we’ll explore the technical tools and standards—like OCI packaging, AI BOMs, and Sigstore—that enable transparency across the AI model lifecycle.

Technical Enablers of Supply Chain Transparency

Licensing and classification frameworks, like the MOF, provide the policy and structure for transparency. However they are not sufficient on their own. We must also invest in the tooling and formats that embed transparency and security directly into the model development and deployment lifecycle. Several open source projects and standards are leading the way in building these essential technical capabilities.

First, borrowing from DevOps best practices for software, the Open Container Initiative (OCI) specification is being actively extended to support the packaging and distribution of AI/ML artifacts. An initiative within the Cloud Native Computing Foundation (CNCF) aims to standardize the format for Model Packaging as OCI artifacts. This involves defining how components, such as model weights, associated code (training, inference), datasets, and configuration files can be bundled together into a single, versioned OCI artifact. This new standard is a game-changer for the AI supply chain. It lets you use familiar container registries (like Quay and Harbor) to distribute your ML artifacts in a standardised way. Plus, it offers immutable packaging and strong version control which are vital for ensuring your models are reproducible and traceable. And because it’s built on OCI standards, it makes deployments much simpler, especially in cloud and Kubernetes environments. Projects like KitOps, now a Sandbox project within the CNCF, are demonstrating the practical application of this emerging OCI standard for ML artifacts.

Just as Software Bill of Materials (SBOMs) have become essential for managing dependencies and risks in software supply chains, AI Bill of Materials (AI BOM) are emerging as an essential asset for understanding the composition of AI systems. Based on the SPDX 3.0 standard which includes specific AI and Dataset Profiles, AI BOMs provide machine-readable documentation. They detail a model’s dependencies, data sources used for training and evaluation, energy usage during development, applicable licenses for all components, security measures, and more. These detailed inventories facilitate licensing audits, enable in-depth risk and bias analysis by downstream users, and are becoming increasingly necessary for demonstrating compliance with regulatory mandates, such as the EU AI Act and FDA AI guidelines for medical devices. AI BOMs supports the MOF openness principles by capturing granular details about how the model was built and what went into it. This is also true even in cases where the full set of components might not be openly released according to the highest MOF classes.

To protect models against tampering, unauthorized modification, and misrepresentation of origin, the Sigstore Model Transparency project under the OpenSSF is applying cryptographic signing to ML models and their associated artifacts. By generating manifests that capture cryptographic hashes of every component within a model release—including weights, configuration files, associated code, and documentation—and signing this manifest, Sigstore provides strong cryptographic proof of integrity and origin. These signatures, stored in public, tamper-evident transparency logs such as Rekor, ensure the verifiability of released models, provide crucial protection against supply chain attacks (such as injecting malicious code or poisoned data), are already well integrated with OCI infrastructure (when applicable), and simplify compliance with organizational and regulatory security policies by allowing users to cryptographically verify the authenticity of the models they consume.

Operationalizing Transparency Across the LLMOps Lifecycle

Achieving genuine model transparency in the GenAI supply chain demands more than just adopting individual standards or tools: it also requires seamlessly integrating these frameworks and technical initiatives into the very operations of MLOps pipelines and embedding them within broader organizational governance structures. By doing so, transparency becomes a default outcome of the development process, rather than an afterthought or a manual compliance burden.

Consider the typical LLMOps lifecycle. In the Data Preparation phase, transparency begins by aligning with MOF requirements for datasets, data preprocessing code, and data cards based on the intended openness class. This involves applying appropriate open data licenses to datasets and open source licenses to the preprocessing code, ensuring they are prepared for potential release. Unlike traditional MLOps, where data scientists and ML engineers often control the entire data lifecycle (exploration, acquisition, feature engineering, training, and post-processing), in Generative AI, one can download a base model where the majority of the data it was trained upon is not controlled by the end-user or even readily accessible. This inherent lack of control over the foundational training data for pre-trained models strongly emphasizes the importance of following MOF principles from day one, particularly for any additional data used for fine-tuning or RAG, as well as the model artifacts themselves. Simultaneously, the DataOps pipeline should automate the generation of the AI BOM’s Dataset profile, documenting crucial information about data sources, collection methods, and preprocessing steps.

Moving into the Model Training stage, the focus shifts to capturing details about the model architecture, training code, intermediate and final model parameters, and metadata as defined by the MOF. Open source licenses are applied to code components, while open data licenses are suitable for parameters and metadata. The AI BOM continues to grow, incorporating information about the training process, hyperparameters, and even energy consumption metrics, enriching the AI profile. To account for iterative changes such as fine-tuning and alignment, which are crucial post-training treatments in Generative AI, a robust versioning approach for the AI BOM is essential. Each significant modification effectively generates a new version of the model, requiring an updated and distinct AI BOM. SPDX 3.0 addresses this through its mandatory packageVersion and releaseTime fields for AI Packages, enabling clear identification and tracking of changes over time. The MOF further emphasizes this by requiring the release of Intermediate Model Parameters (checkpoints and optimizer states) for Class I models, providing granular insight into the model’s evolution.

The Model Evaluation phase is where the model’s performance, fairness, and safety are assessed. Transparency here involves preparing evaluation code, evaluation data, evaluation results, and the model card for release, guided by the chosen MOF class. Correct open source, open data, and open content licenses are applied to these respective artifacts. The AI BOM is updated with detailed evaluation metrics, documented limitations, and identified biases, aligning with fields in the AI profile and Dataset profile (for evaluation data).

Finally, the Model Packaging and Release stage brings everything together. The MLOps pipeline bundles all the required components according to the self-asserted MOF class. The MOF configuration file, detailing the contents and licenses, is generated and included. The complete package, including the finalized AI BOM and MOF Configuration File, is then standardized by packaging it into a single, versioned OCI artifact for ML, following the emerging ModelPack standard. This provides a consistent and portable unit for distribution and deployment, leveraging existing OCI registry infrastructure. Crucially, before this standardized OCI artifact is distributed, the pipeline incorporates a model signing step (potentially using Sigstore). This involves creating a cryptographic manifest (such as an in-toto Statement) that includes hashes of all the components within the package—the model weights, code, data, documentation, and AI BOM. This manifest is then cryptographically signed. The resulting signature, along with verification material (like a certificate) and the manifest, can be bundled with or alongside the OCI artifact (e.g., in a Sigstore Bundle format) and/or recorded in a tamper-evident transparency log, like Rekor. This process provides a verifiable guarantee of the artifact’s integrity and provenance from a trusted source.

This integrated approach extends into Model Deployment and Policy Enforcement. Upon receiving an OCI artifact for ML, deployment systems can automatically verify the model signature before utilizing the model. This verification process checks the signature against the public key or certificate of the producer and, if a transparency log was used (as with Sigstore’s keyless signing), confirms that the signing event is recorded in the log. This ensures that the model and its components have not been tampered with since they were signed by the trusted producer. The embedded AI BOM within the OCI artifact can be used by deployment systems for various checks, such as license compliance verification, identifying known vulnerabilities in included software libraries, and informing risk assessments based on documented biases or limitations. The standardized OCI artifact is easily deployed using existing cloud-native tooling and workflows, ensuring consistency and efficiency.

Throughout the model’s operational life, Monitoring and Governance benefit greatly from the documented transparency. The information captured in the MOF Configuration File and AI BOM serves as crucial documentation for internal and external audits, compliance checks against regulations (like the EU AI Act), and tracking the lineage and components of models in production. Furthermore, the transparency log provides an auditable record of signing events, contributing to accountability. This systematic integration across the MLOps lifecycle, supported by organizational governance and appropriate tooling, transforms the concept of transparency into a tangible and verifiable reality.

Toward Operational Transparency: What’s Next

As we have just seen, a fully transparent GenAI supply chain does not have to be a distant dream. In fact, it’s quickly becoming a tangible reality through coordinated efforts across communities, foundations, and industry partners. However, we’re not done yet. Realizing the full potential of this transparency ecosystem requires continued focus and collaboration.

What’s still needed includes tighter toolchain integration: the MOF classification and documentation, SPDX AI BOM generation, OCI packaging, and Sigstore signing need to be seamlessly woven into end-to-end MLOps pipelines. This ensures that transparency and security are built-in by design, becoming the default outcome of the development process rather than ad-hoc activities. We need to develop and share common reference implementations and templates that demonstrate how to operationalize transparency effectively across different MLOps platforms and workflows, standardizing adoption practices. Cultivating an automation-first mindset is also paramount: from automated license validation and AI BOM generation to automated OCI packaging and model signing, automation is key to scaling responsible AI practices across organizations. Finally, wider ecosystem adoption is crucial. Model hubs such as HuggingFace, enterprises consuming AI models, and open source projects developing foundational AI technologies must align on these transparency practices to make them ubiquitous and unlock the full benefits of a trustworthy AI supply chain.

Conclusion

The case for transparency in GenAI isn’t just a matter of regulatory compliance: it’s critical for building trust and ensuring the responsible development and deployment of powerful AI systems that increasingly shape our lives. The opacity of current models and the prevalence of openwashing demand a structured and technical response. Through initiatives championed by communities like the LF AI & Data Generative AI Commons, and supporting technical standards and tools from other Linux Foundation communities such as the CNCF and OpenSSF, open source is paving the way for a transparent, verifiable, and secure AI future. Operationalizing these elements within MLOps pipelines transforms transparency from an abstract ideal into a fundamental, automated practice. If your organization is releasing GenAI models, or simply adopting them, now is the time to integrate these essential practices into your pipelines and policies towards building trustworthy AI.

Find Out More

Author

LF AI & Data

View all posts