publications | Hsu-Tzu Ting

2025

Best Paper Award

PCIe Bandwidth-Aware Scheduling for Multi-Instance GPU

Yan-Mei Tang, Wei-Fang Sun, Hsu-Tzu Ting, and 3 more authors

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2025

Abs DOI PDF

The increasing computational power of GPUs has driven advancements across various domains, especially in scientific computing and machine learning. However, lighter workloads often do not fully utilize a GPU’s capacity, leading to inefficiencies. The Multi-Instance GPU (MIG) feature in NVIDIA A100 GPUs addresses this issue by allowing a single GPU to be devided into multiple, smaller, isolated instances, thus improving resource allocation for multi-tenant environments. While MIG provides enhanced isolation and predictable performance, we observed that PCIe bandwidth remains a shared resource, which can lead to contention when multiple instances require high bandwidth. This contention can cause performance degradation, particularly in concurrent machine learning inference tasks. In this paper, we identify and address this issue, being among the first to demonstrate PCIe bandwidth contention across MIG instances in tasks with high bandwidth demands. We propose a PCIe bandwidth-aware MIG scheduler that predicts and mitigates contention by preventing simultaneous scheduling of bandwidth-intensive jobs on the same GPU. Our scheduler leverages a performance model to quantify PCIe contention severity, enabling more efficient scheduling decisions. Experimental results show that the proposed scheduler reduces job completion times by approximately 18%, improving GPU resource utilization in both real-world and larger-scale simulated environments.

2024

UCC
KubeComp: A Resource-Centric Composable Container Orchestrator for GPU Pooling

Hsu-Tzu Ting, Chou Jerry, Ming-Hung Chen, and 2 more authors

IEEE/ACM 17th International Conference on Utility and Cloud Computing, 2024

Abs DOI Bib PDF Code

Composable infrastructure, designed to allocate hardware from a disaggregated resource pool dynamically, aligns seamlessly with the resource-centric nature of cloud computing. Although a few GPU disaggregation systems for reconfigurable bare metal devices exist, the corresponding software stacks are lagging. In this work, we designed and implemented KubeComp, the first solution for Kubernetes to support GPU pooling based on a composable infrastructure. KubeComp makes Kubernetes aware of the underlying composable infrastructure and provides on-demand, optimized, and automated resource management for service deployment. With the support of KubeComp, resource utilization is boosted, and the job wait time is reduced by up to 89% for running jobs with diverse CPU-GPU ratios.
@article{kubecomp, author = {Ting, Hsu-Tzu and Jerry, Chou and Chen, Ming-Hung and Chung, I-Hsin and Pan, Huaiyang}, journal = {IEEE/ACM 17th International Conference on Utility and Cloud Computing}, title = {KubeComp: A Resource-Centric Composable Container Orchestrator for GPU Pooling}, year = {2024}, keywords = {Composable, GPU pooling, Container, Scheduling, Cloud Computing}, doi = {10.1109/UCC63386.2024.00014}, }

TPDS

Reproducing Performance of Data-Centric Python by SCC Team From National Tsing Hua University

Fu-Chiang Chang, En-Ming Huang, Hsu-Tzu Ting, and 4 more authors

IEEE Transactions on Parallel and Distributed Systems, 2024

DOI Bib

@article{scc,
  author = {Chang, Fu-Chiang and Huang, En-Ming and Ting, Hsu-Tzu and Kuo, Pin-Yi and Mou, Chan-Yu and Wu, Pang-Ning and Chou, Jerry},
  journal = {IEEE Transactions on Parallel and Distributed Systems},
  title = {Reproducing Performance of Data-Centric Python by SCC Team From National Tsing Hua University},
  year = {2024},
  volume = {},
  number = {},
  pages = {1-5},
  keywords = {Graphics processing units;Benchmark testing;Libraries;Python;Parallel processing;Runtime;Kernel;Parallel computing;reproducibility;student cluster competition},
  doi = {10.1109/TPDS.2024.3355441},
}