Search Results for author: Tianyu Gao

Found 35 papers, 26 papers with code

Few-shot Relation Extraction via Bayesian Meta-learning on Task Graphs

no code implementations ICML 2020 Meng Qu, Tianyu Gao, Louis-Pascal Xhonneux, Jian Tang

This paper studies few-shot relation extraction, which aims at predicting the relation for a pair of entities in a sentence by training with a few labeled examples in each relation.

Graph Neural Network Meta-Learning +3

Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients

no code implementations3 May 2025 Yezhen Wang, Zhouhao Yang, Brian K Chen, Fanyi Pu, Bo Li, Tianyu Gao, Kenji Kawaguchi

In this work, we propose a novel framework, VLoRP, that extends low-rank gradient projection by introducing an additional degree of freedom for controlling the trade-off between memory efficiency and performance, beyond the rank hyper-parameter.

GSM8K MMLU

LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation

no code implementations9 Jan 2025 Xi Ye, Fangcong Yin, Yinghui He, Joie Zhang, Howard Yen, Tianyu Gao, Greg Durrett, Danqi Chen

We evaluated 23 LCLMs, including instruction-tuned models and recent reasoning models, on LongProc at three difficulty levels, with the maximum number of output tokens set at 500, 2K, and 8K.

2k 8k +2

Metadata Conditioning Accelerates Language Model Pre-training

1 code implementation3 Jan 2025 Tianyu Gao, Alexander Wettig, Luxi He, Yihe Dong, Sadhika Malladi, Danqi Chen

The vast diversity of styles, domains, and quality levels present in language model pre-training corpora is essential in developing general model capabilities, but efficiently learning and deploying the correct behaviors exemplified in each of these heterogeneous data sources is challenging.

Language Modeling Language Modelling +1

How to Train Long-Context Language Models (Effectively)

1 code implementation3 Oct 2024 Tianyu Gao, Alexander Wettig, Howard Yen, Danqi Chen

We study continued training and supervised fine-tuning (SFT) of a language model (LM) to make effective use of long-context information.

HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly

1 code implementation3 Oct 2024 Howard Yen, Tianyu Gao, Minmin Hou, Ke Ding, Daniel Fleischer, Peter Izsak, Moshe Wasserblat, Danqi Chen

There have been many benchmarks for evaluating long-context language models (LCLMs), but developers often rely on synthetic tasks like needle-in-a-haystack (NIAH) or arbitrary subsets of tasks.

RAG

LitSearch: A Retrieval Benchmark for Scientific Literature Search

1 code implementation10 Jul 2024 Anirudh Ajith, Mengzhou Xia, Alexis Chevalier, Tanya Goyal, Danqi Chen, Tianyu Gao

LitSearch is constructed using a combination of (1) questions generated by GPT-4 based on paragraphs containing inline citations from research papers and (2) questions manually written by authors about their recently published papers.

Reranking Retrieval

Long-Context Language Modeling with Parallel Context Encoding

1 code implementation26 Feb 2024 Howard Yen, Tianyu Gao, Danqi Chen

However, the substantial computational cost of transformers and limited generalization of positional encoding restrict the size of their context window.

In-Context Learning Instruction Following +2

Improving Language Understanding from Screenshots

1 code implementation21 Feb 2024 Tianyu Gao, ZiRui Wang, Adithya Bhaskar, Danqi Chen

An emerging family of language models (LMs), capable of processing both text and images within a single visual view, has the promise to unlock complex tasks such as chart understanding and UI navigation.

Chart Understanding

Harmonizing SO(3)-Equivariance with Neural Expressiveness: a Hybrid Deep Learning Framework Oriented to the Prediction of Electronic Structure Hamiltonian

no code implementations1 Jan 2024 Shi Yin, Xinyang Pan, XUDONG ZHU, Tianyu Gao, Haochong Zhang, Feng Wu, Lixin He

Deep learning for predicting the electronic structure Hamiltonian of quantum systems necessitates satisfying the covariance laws, among which achieving SO(3)-equivariance without sacrificing the non-linear expressive capability of networks remains unsolved.

Navigate regression

Evaluating Large Language Models at Evaluating Instruction Following

1 code implementation11 Oct 2023 Zhiyuan Zeng, Jiatong Yu, Tianyu Gao, Yu Meng, Tanya Goyal, Danqi Chen

As research in large language models (LLMs) continues to accelerate, LLM-based evaluation has emerged as a scalable and cost-effective alternative to human evaluations for comparing the ever increasing list of models.

Instruction Following

Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

2 code implementations10 Oct 2023 Mengzhou Xia, Tianyu Gao, Zhiyuan Zeng, Danqi Chen

In this work, we study structured pruning as an effective means to develop smaller LLMs from pre-trained, larger models.

Language Modeling Language Modelling +2

Fine-Tuning Language Models with Just Forward Passes

3 code implementations NeurIPS 2023 Sadhika Malladi, Tianyu Gao, Eshaan Nichani, Alex Damian, Jason D. Lee, Danqi Chen, Sanjeev Arora

Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory.

In-Context Learning Multiple-choice

What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning

1 code implementation16 May 2023 Jane Pan, Tianyu Gao, Howard Chen, Danqi Chen

Large language models (LLMs) exploit in-context learning (ICL) to solve tasks with only a few demonstrations, but its mechanisms are not yet well-understood.

In-Context Learning

The CRINGE Loss: Learning what language not to model

no code implementations10 Nov 2022 Leonard Adolphs, Tianyu Gao, Jing Xu, Kurt Shuster, Sainbayar Sukhbaatar, Jason Weston

Standard language model training employs gold human documents or human-human interaction data, and treats all training data as positive examples.

Language Modeling Language Modelling

Transformer-based dimensionality reduction

no code implementations15 Oct 2022 Ruisheng Ran, Tianyu Gao, Bin Fang

Recently, Transformer is much popular and plays an important role in the fields of Machine Learning (ML), Natural Language Processing (NLP), and Computer Vision (CV), etc.

Data Visualization Dimensionality Reduction +2

Automatic Label Sequence Generation for Prompting Sequence-to-sequence Models

1 code implementation COLING 2022 Zichun Yu, Tianyu Gao, Zhengyan Zhang, Yankai Lin, Zhiyuan Liu, Maosong Sun, Jie zhou

Prompting, which casts downstream applications as language modeling tasks, has shown to be sample efficient compared to standard fine-tuning with pre-trained models.

Few-Shot Learning Language Modeling +2

Recovering Private Text in Federated Learning of Language Models

1 code implementation17 May 2022 Samyak Gupta, Yangsibo Huang, Zexuan Zhong, Tianyu Gao, Kai Li, Danqi Chen

For the first time, we show the feasibility of recovering text from large batch sizes of up to 128 sentences.

Federated Learning Word Embeddings

Ditch the Gold Standard: Re-evaluating Conversational Question Answering

2 code implementations ACL 2022 Huihan Li, Tianyu Gao, Manan Goenka, Danqi Chen

In this work, we conduct the first large-scale human evaluation of state-of-the-art conversational QA systems, where human evaluators converse with models and judge the correctness of their answers.

Question Rewriting

SimCSE: Simple Contrastive Learning of Sentence Embeddings

23 code implementations EMNLP 2021 Tianyu Gao, Xingcheng Yao, Danqi Chen

This paper presents SimCSE, a simple contrastive learning framework that greatly advances state-of-the-art sentence embeddings.

Contrastive Learning Data Augmentation +6

Making Pre-trained Language Models Better Few-shot Learners

9 code implementations ACL 2021 Tianyu Gao, Adam Fisch, Danqi Chen

We present LM-BFF--better few-shot fine-tuning of language models--a suite of simple and complementary techniques for fine-tuning language models on a small number of annotated examples.

Few-Shot Learning Zero-Shot Text Classification

TADO: Time-varying Attention with Dual-Optimizer Model

1 code implementation8 Dec 2020 Yuexin Wu, Tianyu Gao, Sihao Wang, Zhongmin Xiong

As the first attempt in this field to address this problem, we propose a flexible dual-optimizer model to gain robustness from both regression loss and classification loss.

Recommendation Systems

Learning from Context or Names? An Empirical Study on Neural Relation Extraction

1 code implementation EMNLP 2020 Hao Peng, Tianyu Gao, Xu Han, Yankai Lin, Peng Li, Zhiyuan Liu, Maosong Sun, Jie zhou

We find that (i) while context is the main source to support the predictions, RE models also heavily rely on the information from entity mentions, most of which is type information, and (ii) existing datasets may leak shallow heuristics via entity mentions and thus contribute to the high performance on RE benchmarks.

Memorization Relation +1

Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs

1 code implementation5 Jul 2020 Meng Qu, Tianyu Gao, Louis-Pascal A. C. Xhonneux, Jian Tang

To more effectively generalize to new relations, in this paper we study the relationships between different relations and propose to leverage a global relation graph.

Graph Neural Network Meta-Learning +3

Continual Relation Learning via Episodic Memory Activation and Reconsolidation

no code implementations ACL 2020 Xu Han, Yi Dai, Tianyu Gao, Yankai Lin, Zhiyuan Liu, Peng Li, Maosong Sun, Jie zhou

Continual relation learning aims to continually train a model on new data to learn incessantly emerging novel relations while avoiding catastrophically forgetting old relations.

Continual Learning Relation

FewRel 2.0: Towards More Challenging Few-Shot Relation Classification

1 code implementation IJCNLP 2019 Tianyu Gao, Xu Han, Hao Zhu, Zhiyuan Liu, Peng Li, Maosong Sun, Jie zhou

We present FewRel 2. 0, a more challenging task to investigate two aspects of few-shot relation classification models: (1) Can they adapt to a new domain with only a handful of instances?

Classification Domain Adaptation +3

OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction

1 code implementation IJCNLP 2019 Xu Han, Tianyu Gao, Yuan YAO, Demin Ye, Zhiyuan Liu, Maosong Sun

OpenNRE is an open-source and extensible toolkit that provides a unified framework to implement neural models for relation extraction (RE).

Information Retrieval Question Answering +3

Neural Snowball for Few-Shot Relation Learning

1 code implementation29 Aug 2019 Tianyu Gao, Xu Han, Ruobing Xie, Zhiyuan Liu, Fen Lin, Leyu Lin, Maosong Sun

To address new relations with few-shot instances, we propose a novel bootstrapping approach, Neural Snowball, to learn new relations by transferring semantic knowledge about existing relations.

Knowledge Graphs Relation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.