no code implementations • 5 Jul 2023 • Prateek Yadav, Qing Sun, Hantian Ding, Xiaopeng Li, Dejiao Zhang, Ming Tan, Xiaofei Ma, Parminder Bhatia, Ramesh Nallapati, Murali Krishna Ramanathan, Mohit Bansal, Bing Xiang
Large-scale code generation models such as Codex and CodeT5 have achieved impressive performance.
2 code implementations • 29 Jun 2023 • Erik Lien Bolager, Iryna Burak, Chinmay Datar, Qing Sun, Felix Dietrich
For Barron functions, we show that the $L^2$-approximation error of sampled shallow networks decreases with the square root of the number of neurons.
no code implementations • 9 Mar 2023 • Xiaokai Wei, Sujan Gonugondla, Wasi Ahmad, Shiqi Wang, Baishakhi Ray, Haifeng Qian, Xiaopeng Li, Varun Kumar, Zijian Wang, Yuchen Tian, Qing Sun, Ben Athiwaratkun, Mingyue Shang, Murali Krishna Ramanathan, Parminder Bhatia, Bing Xiang
Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint.
1 code implementation • 26 Oct 2022 • Ben Athiwaratkun, Sanjay Krishna Gouda, Zijian Wang, Xiaopeng Li, Yuchen Tian, Ming Tan, Wasi Uddin Ahmad, Shiqi Wang, Qing Sun, Mingyue Shang, Sujan Kumar Gonugondla, Hantian Ding, Varun Kumar, Nathan Fulton, Arash Farahani, Siddhartha Jain, Robert Giaquinto, Haifeng Qian, Murali Krishna Ramanathan, Ramesh Nallapati, Baishakhi Ray, Parminder Bhatia, Sudipta Sengupta, Dan Roth, Bing Xiang
Using these benchmarks, we are able to assess the performance of code generation models in a multi-lingual fashion, and discovered generalization ability of language models on out-of-domain languages, advantages of multi-lingual models over mono-lingual, the ability of few-shot prompting to teach the model new languages, and zero-shot translation abilities even on mono-lingual settings.
1 code implementation • 25 Sep 2022 • Qing Sun, Fan Lyu, Fanhua Shang, Wei Feng, Liang Wan
Continual Learning (CL) sequentially learns new tasks like human beings, with the goal to achieve better Stability (S, remembering past tasks) and Plasticity (P, adapting to new tasks).
1 code implementation • 13 Apr 2022 • Griffin Adams, Han-Chin Shing, Qing Sun, Christopher Winestock, Kathleen McKeown, Noémie Elhadad
In real-world scenarios with naturally occurring datasets, reference summaries are noisy and may contain information that cannot be inferred from the source text.
no code implementations • 29 Sep 2021 • Qing Sun
Deep neural networks have achieved impressive performance on a variety of domains.
no code implementations • 29 Sep 2021 • Qing Sun, Fan Lyu, Fanhua Shang, Wei Feng, Liang Wan
Traditionally, the primary goal of LL is to achieve the trade-off between the Stability (remembering past tasks) and Plasticity (adapting to new tasks).
no code implementations • Findings (ACL) 2021 • Qing Sun, Parminder Bhatia
Our gazetteer based fusion model is data efficient, achieving +1. 7 micro-F1 gains on the i2b2 dataset using 20% training data, and brings + 4. 7 micro-F1 gains on novel entity mentions never presented during training.
1 code implementation • EMNLP 2020 • Kristjan Arumae, Qing Sun, Parminder Bhatia
However, in order to achieve state-of-the-art performance on out of domain tasks such as clinical named entity recognition and relation extraction, additional in domain pre-training is required.
no code implementations • 23 Aug 2020 • Qing Sun, James Cross
In this paper, we provide an in-depth analysis of KL-divergence minimization in Forward and Backward orders, which shows that learners are reinforced via on-policy learning in Backward.
no code implementations • 25 Sep 2019 • Qing Sun, James Cross, Dmitriy Genzel
Sequence-to-sequence models such as transformers, which are now being used in a wide variety of NLP tasks, typically need to have very high capacity in order to perform well.
no code implementations • CVPR 2017 • Qing Sun, Stefan Lee, Dhruv Batra
We develop the first approximate inference algorithm for 1-Best (and M-Best) decoding in bidirectional neural sequence models by extending Beam Search (BS) to reason about both forward and backward time dependencies.
24 code implementations • 7 Oct 2016 • Ashwin K. Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, Dhruv Batra
We observe that our method consistently outperforms BS and previously proposed techniques for diverse decoding from neural sequence models.
no code implementations • NeurIPS 2015 • Qing Sun, Dhruv Batra
This paper formulates the search for a set of bounding boxes (as needed in object proposal generation) as a monotone submodular maximization problem over the space of all possible bounding boxes in an image.