PCIe Bandwidth-Aware Scheduling for Multi-Instance GPU
Yan-Mei Tang, Wei-Fang Sun, Hsu-Tzu Ting, and 3 more authors
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2025
The increasing computational power of GPUs has driven advancements across various domains, especially in scientific computing and machine learning. However, lighter workloads often do not fully utilize a GPU’s capacity, leading to inefficiencies. The Multi-Instance GPU (MIG) feature in NVIDIA A100 GPUs addresses this issue by allowing a single GPU to be devided into multiple, smaller, isolated instances, thus improving resource allocation for multi-tenant environments. While MIG provides enhanced isolation and predictable performance, we observed that PCIe bandwidth remains a shared resource, which can lead to contention when multiple instances require high bandwidth. This contention can cause performance degradation, particularly in concurrent machine learning inference tasks. In this paper, we identify and address this issue, being among the first to demonstrate PCIe bandwidth contention across MIG instances in tasks with high bandwidth demands. We propose a PCIe bandwidth-aware MIG scheduler that predicts and mitigates contention by preventing simultaneous scheduling of bandwidth-intensive jobs on the same GPU. Our scheduler leverages a performance model to quantify PCIe contention severity, enabling more efficient scheduling decisions. Experimental results show that the proposed scheduler reduces job completion times by approximately 18%, improving GPU resource utilization in both real-world and larger-scale simulated environments.