By Animesh Singh, Dan Sun, Alexa Griffith on behalf of KServe community
Today, we are announcing KServe is joining the LF AI & Data Foundation as its latest Incubation Project. LF AI & Data is the organization building an ecosystem to sustain innovation in artificial intelligence (AI) and data open source projects.
Released and published as open source by the KServe Project Community, in collaboration with Bloomberg, Google, IBM, NVIDIA and Seldon, KServe (originally known as KFServing) provides a Kubernetes Custom Resource Definition for serving machine learning (ML) models on arbitrary frameworks. It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like TensorFlow, XGBoost, Scikit-learn, PyTorch, and ONNX. KServe encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU Autoscaling, Scale to Zero, and Canary Rollouts to your ML deployments. It enables a simple, pluggable, and complete story for Production ML Serving, including prediction, pre-processing, post-processing, and explainability.
For current users of KServe (and KFServing), please take a few minutes to answer this short survey and provide your feedback!
LF AI & Data supports projects via a wide range of services, and the first step is joining as an Incubation Project. Learn more about KServe on its GitHub repo and join the KServe-Announce Mailing List and KServe-Technical-Discuss Mailing List.
NVIDIA continues to be an active contributor to the KServe open source community project to support effortless deployment of AI machine learning models at scale. NVIDIA Triton Inference Server works in lock-step with KServe to encapsulate the complexity in deployment and scaling of AI in Kubernetes via its serverless inferencing framework.
– Paresh Kharya, Senior Director of Product Management for accelerated computing at NVIDIA.
IBM is both a co-founder and an adopter of KServe. From the inception of KServe (as KFServing in Kubeflow community), to the current move towards LFAI and Data, we have been a major champion. For internet-scale AI applications like IBM Watson Assistant and IBM Watson Natural Language Understanding, hundreds of thousands of models run concurrently. ModelMesh from IBM, open sourced and available as part of KServe project solves the challenge of costly container management for us, effectively allowing us to run hundreds of thousands of models in a single production deployment with minimal footprint
– Animesh Singh, Distinguished Engineer and Director, Watson AI and Data Open Technology
Bloomberg: As we expand the use of AI in the Bloomberg Terminal and our enterprise products, it is critical that we are able to move quickly from idea to prototype to production. It is similarly critical to ensure that models evolve seamlessly once built to accommodate changes in data. This is important not just for building better products faster, but also to ensure that we unlock the creative potential of our AI researchers without burdening them with writing tons of boilerplate code. In this regard, I am both excited and grateful that KServe, which Bloomberg helped found and lead the development of, has taken such strides.
– Anju Kambadur, Head of AI Engineering
CoreWeave Cloud delivers compute on a Kubernetes-native infrastructure, empowering clients building AI-powered applications with fast spin-up times and responsive autoscaling for performant and efficient compute consumption. KServe is the cornerstone of our inference solution, serving models over several thousand GPUs. KServe makes it possible to scale ML deployments, while saving months of DevOps. A ‘must have’ first-class feature in any inference stack.
— Peter Salanki, Director of Engineering, CoreWeave
Naver Search: As South Korea’s leading search engine, Naver Search receives millions of search requests daily and needs 24/7 availability.KServe has allowed us to modernize our AI serving infrastructure and provides the tools needed to handle the traffic scaling differences between day and night cycles.By providing a standardized interface on top of Knative and Kubernetes, KServe allows our AI researchers to focus on creating better models, and putting their hard work into production without becoming experts in delivering and managing highly-available backend services.
– Mark Winter, Software Engineer
Inspur AIStation supports hundreds of users to manage thousands of AI models, as the core component of Inspur AIStation, kserve allows us to deploy a model in a minute, and scale model in seconds. With the flexible interface, we can easily deploy customized AI apps, and define our own Server Runtimes. As a member of Kserve Project, we will keep on contributing to kserve for improving the performance and stability.
– Qingshan Chen, Software Engineer, Inspur
The latest KServe 0.8 release includes features, such as a new ServingRuntime custom resource, ModelMesh multi namespace reconciliation, gRPC protocol support between transformer and predictor, and KServe v2 REST API support for TorchServe runtime. Joining LFAI brings us one step closer to reaching the KServe version 1.0 milestone. The Roadmap for v1.0 features a stabilized API unifying the ModelMesh and single model serving deployments, and more advanced inference graph capabilities.
To learn about how to host an open source project with LF AIData, visit the LF AI & Data website.
KServe Key Links
LF AI & Data Resources
- Learn about membership opportunities
- Explore the interactive landscape
- Check out our technical projects
- Join us at upcoming events
- Read the latest announcements on the blog
- Subscribe to the mailing lists
- Follow us on Twitter or LinkedIn
- Access other resources on LF AI & Data’s GitHub or Wiki