1 code implementation • 9 Oct 2024 • Yuxian Gu, Li Dong, Hongning Wang, Yaru Hao, Qingxiu Dong, Furu Wei, Minlie Huang
In our experiments, we adopt PDS to select data from CommmonCrawl and show that the PDS-selected corpus accelerates the learning of LMs and constantly boosts their performance on a wide range of downstream tasks across various model sizes.
no code implementations • 27 Feb 2024 • Yuxian Gu, Li Dong, Yaru Hao, Qingxiu Dong, Minlie Huang, Furu Wei
This work studies the general principles of improving the learning of language models (LMs), which aims at reducing the necessary training steps for achieving superior performance.
1 code implementation • 11 Sep 2023 • Qingxiu Dong, Li Dong, Ke Xu, Guangyan Zhou, Yaru Hao, Zhifang Sui, Furu Wei
In this work, we use large language models (LLMs) to augment and accelerate research on the P versus NP problem, one of the most important open problems in theoretical computer science and mathematics.
2 code implementations • 26 Jun 2023 • Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei
We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e. g., bounding boxes) and grounding text to the visual world.
Ranked #13 on
Visual Question Answering
on ViP-Bench
1 code implementation • NeurIPS 2023 • Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei
A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence.
1 code implementation • 20 Dec 2022 • Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, Furu Wei
We comprehensively compare the behaviors of in-context learning and explicit finetuning on real tasks to provide empirical evidence that supports our understanding.
3 code implementations • NeurIPS 2023 • Yaru Hao, Zewen Chi, Li Dong, Furu Wei
Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts.
1 code implementation • 13 Dec 2022 • Yaru Hao, Yutao Sun, Li Dong, Zhixiong Han, Yuxian Gu, Furu Wei
Large language models have exhibited intriguing in-context learning capability, achieving promising zero- and few-shot performance without updating the parameters.
no code implementations • 24 Nov 2022 • Yiqiao Jin, Xiting Wang, Yaru Hao, Yizhou Sun, Xing Xie
In this paper, we move towards combining large parametric models with non-parametric prototypical networks.
1 code implementation • 13 Jun 2022 • Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei
Experimental results across various language-only and vision-language benchmarks show that our model outperforms or is competitive with specialized models on finetuning, zero-shot generalization, and few-shot learning.
Ranked #2 on
Image Captioning
on nocaps val
1 code implementation • 20 May 2022 • Zhixiong Han, Yaru Hao, Li Dong, Yutao Sun, Furu Wei
In-context learning of GPT-like models has been recognized as fragile across different hand-crafted templates, and demonstration permutations.
no code implementations • Findings (ACL) 2021 • Yaru Hao, Li Dong, Hangbo Bao, Ke Xu, Furu Wei
Moreover, we propose to use a focal loss for the generator in order to relieve oversampling of correct tokens as replacements.
3 code implementations • ACL 2022 • Damai Dai, Li Dong, Yaru Hao, Zhifang Sui, Baobao Chang, Furu Wei
In this paper, we present preliminary studies on how factual knowledge is stored in pretrained Transformers by introducing the concept of knowledge neurons.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Yaru Hao, Li Dong, Furu Wei, Ke Xu
The recently introduced pre-trained language model BERT advances the state-of-the-art on many NLP tasks through the fine-tuning approach, but few studies investigate how the fine-tuning process improves the model performance on downstream tasks.
2 code implementations • 23 Apr 2020 • Yaru Hao, Li Dong, Furu Wei, Ke Xu
The great success of Transformer-based models benefits from the powerful multi-head self-attention mechanism, which learns token dependencies and encodes contextual information from the input.
no code implementations • IJCNLP 2019 • Yaru Hao, Li Dong, Furu Wei, Ke Xu
Language model pre-training, such as BERT, has achieved remarkable results in many NLP tasks.