no code implementations • 27 Nov 2023 • Jehyeon Bang, Yujeong Choi, Myeongwoo Kim, YongDeok Kim, Minsoo Rhu
As large language models (LLMs) become widespread in various application domains, a critical challenge the AI community is facing is how to train these large AI models in a cost-effective manner.
no code implementations • 23 Feb 2023 • Yujeong Choi, John Kim, Minsoo Rhu
While providing low latency is a fundamental requirement in deploying recommendation services, achieving high resource utility is also crucial in cost-effectively maintaining the datacenter.
no code implementations • 27 Feb 2022 • Yunseong Kim, Yujeong Choi, Minsoo Rhu
However, maximizing server utilization and system throughput is also crucial for ML service providers as it helps lower the total-cost-of-ownership.
no code implementations • 25 Oct 2020 • Yujeong Choi, Yunseong Kim, Minsoo Rhu
In cloud ML inference systems, batching is an essential technique to increase throughput which helps optimize total-cost-of-ownership.
no code implementations • 15 Nov 2019 • Bongjoon Hyun, Youngeun Kwon, Yujeong Choi, John Kim, Minsoo Rhu
To satisfy the compute and memory demands of deep neural networks, neural processing units (NPUs) are widely being utilized for accelerating deep learning algorithms.
1 code implementation • 6 Sep 2019 • Yujeong Choi, Minsoo Rhu
To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources among multiple DNN service requests.