Kompute Releases v0.8.0 to Continue Advancing Cross-Vendor GPU Acceleration

By September 16, 2021September 29th, 2021Blog

500 Github Star Milestone, Edge-Device Support, CNN Implementations, Variable Types, MatMul Benchmarks, and Binary Optimisations

Kompute, an LF AI & Data Foundation Sandbox-Stage Project advancing the cross-vendor GPU acceleration ecosystem, has released version 0.8.0 which includes major milestones including reaching 500 github stars, edge-device extensions, convolutional neural network (CNN) implementations, variable data types and more. Kompute is a general purpose GPU compute framework for AI & Machine Learning applications which works across cross-vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). The Kompute framework provides a flexible interface that can be adopted by mobile, desktop, cloud and edge applications to enable highly optimizable GPU acceleration. The framework includes a high-level Python interface for advanced data processing use-cases, as well as an extensible low-level C++ interface that provides high performance device-specific optimizations.

The newly released 0.8.0 version of Kompute introduces major improvements to the general cross-platform compatibility and GPU acceleration features of Kompute. The high level summary of the highlights are as follows:

  • Milestone of 500 Github repo stars
  • Broader edge-device support with mesa-driver integration
  • Convolutional Neural Network (CNN) Implementations
  • Support for variable types across GPU parameters
  • Semi-optimized Matrix Multiplication Kernel Benchmark implementation
  • Significant reduction on 3rd party dependencies (15mb->~1mb binary)

If you are interested to learn more, you can join us at our next “GPU Acceleration” monthly call on September 28th at 9:00 EST / 13:00 UTC / 20:00 CST where we will be covering Kompute updates as well as general cross-vendor GPU Acceleration topics.

We will also be giving a talk at CppCon2021 this year, so if you are around please drop by our talk and say hello, or feel free to ask any questions during the Q&A.

The Kompute repo reaches 500 Github stars

We are thrilled to see the fantastic growth and adoption of the Kompute Project, as well as the great discourse that it has continuously encouraged to further the cross-vendor GPU acceleration ecosystem. Today we celebrate the Kompute Project reaching 500 stars in Github, which is a major milestone following from Kompute’s one-year birthday last month. Github stars can become a shallow metric if that’s the only thing that is being used to calculate a project’s growth, so we will be keen to identify other metrics that allow us to ensure our community grows steadily, including number of contributors, contributions, community interactions in our discord, etc.

Broader edge-device support with mesa-driver integration

As part of our 0.8.0 release we have significantly extended edge-device support to hundreds of devices by supporting mesa-drivers as first-class components thanks to this great external contribution. We have added an official tutorial that showcases the integration with the mesa Broadcom drivers running in a Raspberry Pi, which can be adopted across other edge devices for GPU acceleration implementations.

This is a fantastic addition as it showcases the flexibility of the capabilities of Kompute. The example required advanced GPU computing concepts to address some short-comings of limited hardware, such as the need to expose means to add GPU extensions explicitly, as well as adding flexible memory barriers operations that can be used to ensure consistency in more limited devices with non-coherent GPU memory.

Convolutional Neural Network (CNN) Implementations

We have introduced a high level example that provides an implementation of a convolutional neural network (CNN) that enables for image resolution upscaling, which means that images can improve their quality through purely the machine learning implementation. This is another fantastic external contribution from the great Kompute community.

This example showcases how to import a pre-trained deep learning model. We create the Kompute code that loads model weights, then we create Kompute logic that performs inference on image, and we run model against image to perform resolution upscale on any image.

Small imageVGG7 InferenceLarger Image

Support for variable types across GPU parameters

By default, the simplified interfaces of Kompute will expose the ability to deal with float scalar types, which may be enough to get through the basic conceptual examples. However, as you develop real-world applications, more specialized types may be required for the different components that Kompute exposes to perform computation in the GPUs.

In version 0.8.0 of Kompute we introduce richer support for variable types across the Python and C++ interface that allow users to set different scalar values, and in some cases user-defined structs for their Kompute resources. More specifically, we have added support for multiple scalar types for the Kompute Tensor resource, multiple scalar type & arbitrary user-defined struct support for Kompute Push Constants, and multiple scalar types for Specialization Constants.

Semi-optimized Matrix Multiplication Kernel Benchmark example

In this release of Kompute we have received another great external contribution of an example that starts off with a naive implementation of a matrix multiplication algorithm, and then shows how to iteratively improve performance with high-level benchmarking techniques . This highlights how increasing matrix size can also increase the performance in GFLOPS in the specific optimizations introduced. The initial experimentation was based on the SGEMM in WebGL2-compute article on the public library ibiblio.org, and explores some initial improvements with basic and slightly more optimized tiling. This is still a work we would be interested to explore further and would be great to receive further contributions.

Significant reduction on 3rd party dependencies

The Kompute project has now been updated to reduce the 3rd party dependencies. This release removes some dependencies in favour of modularised functional utilities that are only used in the testing framework. This results in a staggering optimization of the binary, reducing the size by an order of magnitude, bringing the library binary from 15MB down to 1MB. This also simplifies cross-platform compatibility, as it requires less dependencies to build in different architectures.

The main dependency that has been removed is GLSLang, which was being used to provide a single function to perform online shader compilation primarily for the tests and simple examples. Instead we have now moved to allowing users to bring in their preferred method of performing compilation of shaders to SPIR-V, whilst still providing guidance on how Kompute users would be able to do it through simple methods.

Join the Kompute Project

The core objective of the Kompute project is to contribute to and further the GPU computing ecosystem across both, scientific and industry applications, through cross-vendor graphics card tooling and capabilities. We have seen very positive reception and adoption of Kompute across various development communities, including advanced data processing use-cases in mobile applications, game development engines, edge device and cloud, and we would love to engage with the broader community to hear thoughts on suggestions and improvements.

The Kompute Project invites you to adopt or upgrade to version 0.8.0 and welcomes feedback. For details on the additional features and improvements, please refer to the release notes here

As mentioned previously, if you are interested to learn more, you can join us at our next GPU Acceleration call on September 28th at 9:00 EST / 13:00 UTC / 20:00 CST where we will be covering Kompute updates as well as general cross-vendor GPU Acceleration topics.

Kompute Key Links

LF AI & Data Resources