no code implementations • 9 Aug 2024 • Yiqi Liu, Yuqi Xue, Yu Cheng, Lingxiao Ma, Ziming Miao, Jilong Xue, Jian Huang
As AI chips incorporate numerous parallelized cores to scale deep learning (DL) computing, inter-core communication is enabled recently by employing high-bandwidth and low-latency interconnect links on the chip (e. g., Graphcore IPU).
no code implementations • 7 Aug 2024 • Yuqi Xue, Yiqi Liu, Lifeng Nai, Jian Huang
To maximize the resource utilization while ensuring reasonable quality of service, a natural approach is to virtualize NPUs for efficient resource sharing for multi-tenant ML services.
1 code implementation • 13 Oct 2023 • Haoyang Zhang, Yirui Eric Zhou, Yuqi Xue, Yiqi Liu, Jian Huang
Based on this unified GPU memory and storage architecture, G10 utilizes compiler techniques to characterize the tensor behaviors in deep learning workloads.
no code implementations • 14 Nov 2022 • Yuqi Xue
In this paper, we propose a novel strategy for text-independent speaker identification system: Multi-Label Training (MLT).