Search Results for author: Qingqing Cao

Found 14 papers, 10 papers with code

IrEne-viz: Visualizing Energy Consumption of Transformer Models

1 code implementation EMNLP (ACL) 2021 Yash Kumar Lal, Reetu Singh, Harsh Trivedi, Qingqing Cao, Aruna Balasubramanian, Niranjan Balasubramanian

IrEne is an energy prediction system that accurately predicts the interpretable inference energy consumption of a wide range of Transformer-based NLP models.

APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference

1 code implementation22 Jan 2024 Bowen Zhao, Hannaneh Hajishirzi, Qingqing Cao

Compared to baselines, our experiments show that APT maintains up to 98% task performance when pruning RoBERTa and T5 models with 40% parameters left while keeping 86. 4% LLaMA models' performance with 70% parameters remained.

parameter-efficient fine-tuning

BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models

1 code implementation2 Oct 2023 Qingqing Cao, Sewon Min, Yizhong Wang, Hannaneh Hajishirzi

Retrieval augmentation addresses many critical problems in large language models such as hallucination, staleness, and privacy leaks.

Hallucination Retrieval

AdANNS: A Framework for Adaptive Semantic Search

1 code implementation NeurIPS 2023 Aniket Rege, Aditya Kusupati, Sharan Ranjit S, Alan Fan, Qingqing Cao, Sham Kakade, Prateek Jain, Ali Farhadi

Finally, we demonstrate that AdANNS can enable inference-time adaptivity for compute-aware search on ANNS indices built non-adaptively on matryoshka representations.

Natural Questions Quantization +1

PuMer: Pruning and Merging Tokens for Efficient Vision Language Models

1 code implementation27 May 2023 Qingqing Cao, Bhargavi Paranjape, Hannaneh Hajishirzi

Large-scale vision language (VL) models use Transformers to perform cross-modal interactions between the input text and image.

Token Reduction

IrEne: Interpretable Energy Prediction for Transformers

1 code implementation ACL 2021 Qingqing Cao, Yash Kumar Lal, Harsh Trivedi, Aruna Balasubramanian, Niranjan Balasubramanian

We present IrEne, an interpretable and extensible energy prediction system that accurately predicts the inference energy consumption of a wide range of Transformer-based NLP models.

Bew: Towards Answering Business-Entity-Related Web Questions

no code implementations10 Dec 2020 Qingqing Cao, Oriana Riva, Aruna Balasubramanian, Niranjan Balasubramanian

We present a practical approach, called BewQA, that can answer Bew queries by mining a template of the business-related webpages and using the template to guide the search.

Towards Accurate and Reliable Energy Measurement of NLP Models

1 code implementation EMNLP (sustainlp) 2020 Qingqing Cao, Aruna Balasubramanian, Niranjan Balasubramanian

In this work, we show that existing software-based energy measurements are not accurate because they do not take into account hardware differences and how resource utilization affects energy consumption.

Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.