Search Results for author: Alvin Cheung

Found 18 papers, 11 papers with code

Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity

1 code implementation22 Apr 2024 Tyler Griggs, Xiaoxuan Liu, Jiaxiang Yu, Doyoung Kim, Wei-Lin Chiang, Alvin Cheung, Ion Stoica

Within this space, we show that there is not a linear relationship between GPU cost and performance, and identify three key LLM service characteristics that significantly affect which GPU type is the most cost effective: model request size, request rate, and latency service-level objective (SLO).

Language Modelling

Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks

1 code implementation7 Mar 2024 Linyuan Gong, Sida Wang, Mostafa Elhoushi, Alvin Cheung

We introduce Syntax-Aware Fill-In-the-Middle (SAFIM), a new benchmark for evaluating Large Language Models (LLMs) on the code Fill-in-the-Middle (FIM) task.

Code Completion

AST-T5: Structure-Aware Pretraining for Code Generation and Understanding

1 code implementation5 Jan 2024 Linyuan Gong, Mostafa Elhoushi, Alvin Cheung

Large language models (LLMs) have made significant advancements in code-related tasks, yet many LLMs treat code as simple sequences, neglecting its structured nature.

Code Generation

Online Speculative Decoding

no code implementations11 Oct 2023 Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Ion Stoica, Zhijie Deng, Alvin Cheung, Hao Zhang

We develop a prototype of online speculative decoding based on online knowledge distillation and evaluate it using both synthetic and real query data on several popular LLMs.

Knowledge Distillation

Spatialyze: A Geospatial Video Analytics System with Spatial-Aware Optimizations

1 code implementation7 Aug 2023 Chanwut Kittivorawong, Yongming Ge, Yousef Helal, Alvin Cheung

In this paper, we describe Spatialyze, a new framework for end-to-end querying of geospatial videos.

Management

SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training Dynamics

no code implementations29 May 2023 Arash Ardakani, Altan Haan, Shangyin Tan, Doru Thom Popovici, Alvin Cheung, Costin Iancu, Koushik Sen

This allows SlimFit to freeze up to 95% of layers and reduce the overall on-device GPU memory usage of transformer-based models such as ViT and BERT by an average of 2. 2x, across different NLP and CV benchmarks/datasets such as GLUE, SQuAD 2. 0, CIFAR-10, CIFAR-100 and ImageNet with an average degradation of 0. 2% in accuracy.

Quantization Scheduling

Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers

1 code implementation21 May 2023 Linyuan Gong, Chenyan Xiong, Xiaodong Liu, Payal Bajaj, Yiqing Xie, Alvin Cheung, Jianfeng Gao, Xia Song

This paper explores the effectiveness of model-generated signals in improving zero-shot generalization of text-to-text Transformers such as T5.

Zero-shot Generalization

An Evaluation of Memory Optimization Methods for Training Neural Networks

no code implementations26 Mar 2023 Xiaoxuan Liu, Siddharth Jha, Alvin Cheung

To address the challenge, this paper summarizes the scenarios in which MOMs prove advantageous for model training.

Quantization

ADELT: Transpilation Between Deep Learning Frameworks

no code implementations7 Mar 2023 Linyuan Gong, Jiayi Wang, Alvin Cheung

We propose the Adversarial DEep Learning Transpiler (ADELT), a novel approach to source-to-source transpilation between deep learning frameworks.

NumS: Scalable Array Programming for the Cloud

1 code implementation28 Jun 2022 Melih Elibol, Vinamra Benara, Samyu Yagati, Lianmin Zheng, Alvin Cheung, Michael I. Jordan, Ion Stoica

LSHS is a local search method which optimizes operator placement by minimizing maximum memory and network load on any given node within a distributed system.

regression Scheduling

GACT: Activation Compressed Training for Generic Network Architectures

1 code implementation22 Jun 2022 Xiaoxuan Liu, Lianmin Zheng, Dequan Wang, Yukuo Cen, Weize Chen, Xu Han, Jianfei Chen, Zhiyuan Liu, Jie Tang, Joey Gonzalez, Michael Mahoney, Alvin Cheung

Training large neural network (NN) models requires extensive memory resources, and Activation Compressed Training (ACT) is a promising approach to reduce training memory footprint.

Falx: Synthesis-Powered Visualization Authoring

no code implementations1 Feb 2021 Chenglong Wang, Yu Feng, Rastislav Bodik, Isil Dillig, Alvin Cheung, Amy J. Ko

Modern visualization tools aim to allow data analysts to easily create exploratory visualizations.

Human-Computer Interaction Programming Languages

New Directions in Cloud Programming

1 code implementation4 Jan 2021 Alvin Cheung, Natacha Crooks, Joseph M. Hellerstein, Matthew Milano

Nearly twenty years after the launch of AWS, it remains difficult for most developers to harness the enormous potential of the cloud.

Program Synthesis Distributed, Parallel, and Cluster Computing Databases Operating Systems Programming Languages

Learning Programmatic Idioms for Scalable Semantic Parsing

no code implementations IJCNLP 2019 Srinivasan Iyer, Alvin Cheung, Luke Zettlemoyer

Programmers typically organize executable source code using high-level coding patterns or idiomatic structures such as nested loops, exception handlers and recursive blocks, rather than as individual code tokens.

Code Generation Semantic Parsing

Mapping Language to Code in Programmatic Context

1 code implementation EMNLP 2018 Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Luke Zettlemoyer

To study this phenomenon, we introduce the task of generating class member functions given English documentation and the programmatic context provided by the rest of the class.

Learning a Neural Semantic Parser from User Feedback

no code implementations ACL 2017 Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, Luke Zettlemoyer

We present an approach to rapidly and easily build natural language interfaces to databases for new domains, whose performance improves over time based on user feedback, and requires minimal intervention.

SQL Parsing

Cannot find the paper you are looking for? You can Submit a new open access paper.