Search Results for author: Yaru Hao

Found 14 papers, 10 papers with code

Large Language Model for Science: A Study on P vs. NP

1 code implementation11 Sep 2023 Qingxiu Dong, Li Dong, Ke Xu, Guangyan Zhou, Yaru Hao, Zhifang Sui, Furu Wei

In this work, we use large language models (LLMs) to augment and accelerate research on the P versus NP problem, one of the most important open problems in theoretical computer science and mathematics.

Language Modelling Large Language Model

Kosmos-2: Grounding Multimodal Large Language Models to the World

2 code implementations26 Jun 2023 Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei

We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e. g., bounding boxes) and grounding text to the visual world.

Image Captioning Language Modelling +7

Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers

1 code implementation20 Dec 2022 Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, Furu Wei

We comprehensively compare the behaviors of in-context learning and explicit finetuning on real tasks to provide empirical evidence that supports our understanding.

Open-Ended Question Answering

Optimizing Prompts for Text-to-Image Generation

2 code implementations NeurIPS 2023 Yaru Hao, Zewen Chi, Li Dong, Furu Wei

Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts.

Language Modelling Prompt Engineering +1

Structured Prompting: Scaling In-Context Learning to 1,000 Examples

1 code implementation13 Dec 2022 Yaru Hao, Yutao Sun, Li Dong, Zhixiong Han, Yuxian Gu, Furu Wei

Large language models have exhibited intriguing in-context learning capability, achieving promising zero- and few-shot performance without updating the parameters.

Prototypical Fine-tuning: Towards Robust Performance Under Varying Data Sizes

no code implementations24 Nov 2022 Yiqiao Jin, Xiting Wang, Yaru Hao, Yizhou Sun, Xing Xie

In this paper, we move towards combining large parametric models with non-parametric prototypical networks.

Language Models are General-Purpose Interfaces

1 code implementation13 Jun 2022 Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei

Experimental results across various language-only and vision-language benchmarks show that our model outperforms or is competitive with specialized models on finetuning, zero-shot generalization, and few-shot learning.

Causal Language Modeling Few-Shot Learning +4

Prototypical Calibration for Few-shot Learning of Language Models

1 code implementation20 May 2022 Zhixiong Han, Yaru Hao, Li Dong, Yutao Sun, Furu Wei

In-context learning of GPT-like models has been recognized as fragile across different hand-crafted templates, and demonstration permutations.

Few-Shot Learning

Learning to Sample Replacements for ELECTRA Pre-Training

no code implementations Findings (ACL) 2021 Yaru Hao, Li Dong, Hangbo Bao, Ke Xu, Furu Wei

Moreover, we propose to use a focal loss for the generator in order to relieve oversampling of correct tokens as replacements.

Language Modelling Masked Language Modeling

Knowledge Neurons in Pretrained Transformers

3 code implementations ACL 2022 Damai Dai, Li Dong, Yaru Hao, Zhifang Sui, Baobao Chang, Furu Wei

In this paper, we present preliminary studies on how factual knowledge is stored in pretrained Transformers by introducing the concept of knowledge neurons.

Investigating Learning Dynamics of BERT Fine-Tuning

no code implementations Asian Chapter of the Association for Computational Linguistics 2020 Yaru Hao, Li Dong, Furu Wei, Ke Xu

The recently introduced pre-trained language model BERT advances the state-of-the-art on many NLP tasks through the fine-tuning approach, but few studies investigate how the fine-tuning process improves the model performance on downstream tasks.

Language Modelling

Self-Attention Attribution: Interpreting Information Interactions Inside Transformer

2 code implementations23 Apr 2020 Yaru Hao, Li Dong, Furu Wei, Ke Xu

The great success of Transformer-based models benefits from the powerful multi-head self-attention mechanism, which learns token dependencies and encodes contextual information from the input.

Visualizing and Understanding the Effectiveness of BERT

no code implementations IJCNLP 2019 Yaru Hao, Li Dong, Furu Wei, Ke Xu

Language model pre-training, such as BERT, has achieved remarkable results in many NLP tasks.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.