no code implementations • 5 May 2025 • Junlin Wang, Roy Xie, Shang Zhu, Jue Wang, Ben Athiwaratkun, Bhuwan Dhingra, Shuaiwen Leon Song, Ce Zhang, James Zou
Building helpful and harmless large language models (LLMs) requires effective model alignment approach based on human instructions and feedback, which necessitates high-quality human-labeled data.
no code implementations • 19 Apr 2025 • Rahul Thapa, Andrew Li, Qingyang Wu, Bryan He, Yuki Sahashi, Christina Binder, Angela Zhang, Ben Athiwaratkun, Shuaiwen Leon Song, David Ouyang, James Zou
On these benchmarks, the 2B model achieves gains of 99. 1% and 98. 1%, while the 7B model shows gains of 22. 5% and 52. 1%, respectively, demonstrating the models' ability to generalize and perform biomedical video understanding on cleaner and more standardized datasets than those seen during training.
no code implementations • 18 Apr 2025 • Junlin Wang, Shang Zhu, Jon Saad-Falcon, Ben Athiwaratkun, Qingyang Wu, Jue Wang, Shuaiwen Leon Song, Ce Zhang, Bhuwan Dhingra, James Zou
There is intense interest in investigating how inference time compute (ITC) (e. g. repeated sampling, refinements, etc) can improve large language model (LLM) capabilities.
no code implementations • 17 Apr 2025 • Linda He, Jue Wang, Maurice Weber, Shang Zhu, Ben Athiwaratkun, Ce Zhang
Large Language Models (LLMs) struggle with long-context reasoning, not only due to the quadratic scaling of computational complexity with sequence length but also because of the scarcity and expense of annotating long-context data.
1 code implementation • 11 Jan 2025 • Muru Zhang, Mayank Mishra, Zhongzhu Zhou, William Brandon, Jue Wang, Yoon Kim, Jonathan Ragan-Kelley, Shuaiwen Leon Song, Ben Athiwaratkun, Tri Dao
Our insight is that in addition to systems optimization, one can also redesign the model architecture to decouple communication from computation.
1 code implementation • 19 Nov 2024 • Maurice Weber, Daniel Fu, Quentin Anthony, Yonatan Oren, Shane Adams, Anton Alexandrov, Xiaozhong Lyu, Huu Nguyen, Xiaozhe Yao, Virginia Adams, Ben Athiwaratkun, Rahul Chalamala, Kezhen Chen, Max Ryabinin, Tri Dao, Percy Liang, Christopher Ré, Irina Rish, Ce Zhang
In addition, we release RedPajama-V2, a massive web-only dataset consisting of raw, unfiltered text data together with quality signals and metadata.
1 code implementation • 26 Aug 2024 • James Liu, Pragaash Ponnusamy, Tianle Cai, Han Guo, Yoon Kim, Ben Athiwaratkun
Activation sparsity can enable practical inference speedups in large language models (LLMs) by reducing the compute and memory-movement required for matrix multiplications during the forward pass.
no code implementations • 10 Jun 2024 • Junlin Wang, Siddhartha Jain, Dejiao Zhang, Baishakhi Ray, Varun Kumar, Ben Athiwaratkun
A diverse array of reasoning strategies has been proposed to elicit the capabilities of large language models.
3 code implementations • 7 Jun 2024 • Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, James Zou
With the growing number of LLMs, how to harness the collective expertise of multiple LLMs is an exciting open direction.
1 code implementation • 3 Jun 2024 • Rahul Thapa, Kezhen Chen, Ian Covert, Rahul Chalamala, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou
Recent advances in vision-language models (VLMs) have demonstrated the advantages of processing images at higher resolutions and utilizing multi-crop features to preserve native resolution details.
Ranked #150 on
Visual Question Answering
on MM-Vet
no code implementations • 13 Mar 2024 • Ben Athiwaratkun, Sujan Kumar Gonugondla, Sanjay Krishna Gouda, Haifeng Qian, Hantian Ding, Qing Sun, Jun Wang, Jiacheng Guo, Liangfu Chen, Parminder Bhatia, Ramesh Nallapati, Sudipta Sengupta, Bing Xiang
This study introduces bifurcated attention, a method designed to enhance language model inference in shared-context batch decoding scenarios.
no code implementations • 13 Mar 2024 • Ben Athiwaratkun, Shiqi Wang, Mingyue Shang, Yuchen Tian, Zijian Wang, Sujan Kumar Gonugondla, Sanjay Krishna Gouda, Rob Kwiatowski, Ramesh Nallapati, Bing Xiang
Generative models, widely utilized in various applications, can often struggle with prompts corresponding to partial tokens.
no code implementations • 9 Mar 2023 • Xiaokai Wei, Sujan Gonugondla, Wasi Ahmad, Shiqi Wang, Baishakhi Ray, Haifeng Qian, Xiaopeng Li, Varun Kumar, Zijian Wang, Yuchen Tian, Qing Sun, Ben Athiwaratkun, Mingyue Shang, Murali Krishna Ramanathan, Parminder Bhatia, Bing Xiang
Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint.
2 code implementations • 26 Oct 2022 • Ben Athiwaratkun, Sanjay Krishna Gouda, Zijian Wang, Xiaopeng Li, Yuchen Tian, Ming Tan, Wasi Uddin Ahmad, Shiqi Wang, Qing Sun, Mingyue Shang, Sujan Kumar Gonugondla, Hantian Ding, Varun Kumar, Nathan Fulton, Arash Farahani, Siddhartha Jain, Robert Giaquinto, Haifeng Qian, Murali Krishna Ramanathan, Ramesh Nallapati, Baishakhi Ray, Parminder Bhatia, Sudipta Sengupta, Dan Roth, Bing Xiang
Using these benchmarks, we are able to assess the performance of code generation models in a multi-lingual fashion, and discovered generalization ability of language models on out-of-domain languages, advantages of multi-lingual models over mono-lingual, the ability of few-shot prompting to teach the model new languages, and zero-shot translation abilities even on mono-lingual settings.
no code implementations • 11 May 2021 • Yang Li, Ben Athiwaratkun, Cicero Nogueira dos santos, Bing Xiang
In this work, we propose to leverage the prior information embedded in pretrained language models (LM) to improve generalization for intent classification and slot labeling tasks with limited training data.
no code implementations • EMNLP 2021 • Dheeru Dua, Cicero Nogueira dos santos, Patrick Ng, Ben Athiwaratkun, Bing Xiang, Matt Gardner, Sameer Singh
Compositional reasoning tasks like multi-hop question answering, require making latent decisions to get the final answer, given a question.
2 code implementations • ICLR 2021 • Giovanni Paolini, Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cicero Nogueira dos santos, Bing Xiang, Stefano Soatto
We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks including joint entity and relation extraction, nested named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, and dialogue state tracking.
Ranked #3 on
Relation Classification
on TACRED
no code implementations • EMNLP 2020 • Ben Athiwaratkun, Cicero Nogueira dos santos, Jason Krone, Bing Xiang
We set a new state-of-the-art for few-shot slot labeling, improving substantially upon the previous 5-shot ($75. 0\% \rightarrow 90. 9\%$) and 1-shot ($70. 4\% \rightarrow 81. 0\%$) state-of-the-art results.
2 code implementations • ICLR 2019 • Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson
Presently the most successful approaches to semi-supervised learning are based on consistency regularization, whereby a model is trained to be robust to small perturbations of its inputs and parameters.
1 code implementation • ACL 2018 • Ben Athiwaratkun, Andrew Gordon Wilson, Anima Anandkumar
We introduce Probabilistic FastText, a new model for word embeddings that can capture multiple word senses, sub-word structure, and uncertainty information.
2 code implementations • ICLR 2018 • Ben Athiwaratkun, Andrew Gordon Wilson
By representing words with probability densities rather than point vectors, probabilistic word embeddings can capture rich and interpretable semantic information and uncertainty.
Ranked #2 on
Lexical Entailment
on HyperLex
2 code implementations • ACL 2017 • Ben Athiwaratkun, Andrew Gordon Wilson
Word embeddings provide point representations of words containing useful semantic information.
2 code implementations • TACL 2018 • Xilun Chen, Yu Sun, Ben Athiwaratkun, Claire Cardie, Kilian Weinberger
To tackle the sentiment classification problem in low-resource languages without adequate annotated data, we propose an Adversarial Deep Averaging Network (ADAN) to transfer the knowledge learned from labeled data on a resource-rich source language to low-resource languages where only unlabeled data exists.
no code implementations • 8 Jul 2015 • Ben Athiwaratkun, Keegan Kang
Our results show that CNN feature maps can be used with Random Forests and SVM to yield classification results that outperforms the original CNN.