Search Results for author: Yujeong Choi

Found 6 papers, 1 papers with code

vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training

no code implementations27 Nov 2023 Jehyeon Bang, Yujeong Choi, Myeongwoo Kim, YongDeok Kim, Minsoo Rhu

As large language models (LLMs) become widespread in various application domains, a critical challenge the AI community is facing is how to train these large AI models in a cost-effective manner.

Language Modelling Large Language Model

Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations

no code implementations23 Feb 2023 Yujeong Choi, John Kim, Minsoo Rhu

While providing low latency is a fundamental requirement in deploying recommendation services, achieving high resource utility is also crucial in cost-effectively maintaining the datacenter.

PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers

no code implementations27 Feb 2022 Yunseong Kim, Yujeong Choi, Minsoo Rhu

However, maximizing server utilization and system throughput is also crucial for ML service providers as it helps lower the total-cost-of-ownership.

Scheduling

LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference

no code implementations25 Oct 2020 Yujeong Choi, Yunseong Kim, Minsoo Rhu

In cloud ML inference systems, batching is an essential technique to increase throughput which helps optimize total-cost-of-ownership.

BIG-bench Machine Learning Scheduling

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units

no code implementations15 Nov 2019 Bongjoon Hyun, Youngeun Kwon, Yujeong Choi, John Kim, Minsoo Rhu

To satisfy the compute and memory demands of deep neural networks, neural processing units (NPUs) are widely being utilized for accelerating deep learning algorithms.

Management Translation

PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units

1 code implementation6 Sep 2019 Yujeong Choi, Minsoo Rhu

To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources among multiple DNN service requests.

Scheduling

Cannot find the paper you are looking for? You can Submit a new open access paper.