Search Results for author: Lili Yu

Found 15 papers, 8 papers with code

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

1 code implementation12 Apr 2024 Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy.

Jointly Training Large Autoregressive Multimodal Models

1 code implementation27 Sep 2023 Emanuele Aiello, Lili Yu, Yixin Nie, Armen Aghajanyan, Barlas Oguz

In recent years, advances in the large-scale pretraining of language and text-to-image models have revolutionized the field of machine learning.

Image Generation

LIMA: Less Is More for Alignment

5 code implementations NeurIPS 2023 Chunting Zhou, PengFei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, Omer Levy

Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences.

Language Modelling reinforcement-learning

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

no code implementations NeurIPS 2023 Lili Yu, Dániel Simig, Colin Flaherty, Armen Aghajanyan, Luke Zettlemoyer, Mike Lewis

Autoregressive transformers are spectacular models for short sequences but scale poorly to long sequences such as high-resolution images, podcasts, code, or books.

Density Estimation Language Modelling

VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation

no code implementations4 May 2023 Xilun Chen, Lili Yu, Wenhan Xiong, Barlas Oğuz, Yashar Mehdad, Wen-tau Yih

We propose a new two-stage pre-training framework for video-to-text generation tasks such as video captioning and video question answering: A generative encoder-decoder model is first jointly pre-trained on massive image-text data to learn fundamental vision-language concepts, and then adapted to video data in an intermediate video-text pre-training stage to learn video-specific skills such as spatio-temporal reasoning.

Question Answering Text Generation +3

Scaling Laws for Generative Mixed-Modal Language Models

no code implementations10 Jan 2023 Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, Susan Zhang, Stephen Roller, Naman Goyal, Omer Levy, Luke Zettlemoyer

To better understand the scaling properties of such mixed-modal models, we conducted over 250 experiments using seven different modalities and model sizes ranging from 8 million to 30 billion, trained on 5-100 billion tokens.

Improving Faithfulness of Abstractive Summarization by Controlling Confounding Effect of Irrelevant Sentences

no code implementations19 Dec 2022 Asish Ghoshal, Arash Einolghozati, Ankit Arun, Haoran Li, Lili Yu, Vera Gor, Yashar Mehdad, Scott Wen-tau Yih, Asli Celikyilmaz

Lack of factual correctness is an issue that still plagues state-of-the-art summarization systems despite their impressive progress on generating seemingly fluent summaries.

Abstractive Text Summarization

Nutri-bullets Hybrid: Consensual Multi-document Summarization

no code implementations NAACL 2021 Darsh Shah, Lili Yu, Tao Lei, Regina Barzilay

We present a method for generating comparative summaries that highlight similarities and contradictions in input documents.

Document Summarization Language Modelling +3

Nutribullets Hybrid: Multi-document Health Summarization

2 code implementations8 Apr 2021 Darsh J Shah, Lili Yu, Tao Lei, Regina Barzilay

We present a method for generating comparative summaries that highlights similarities and contradictions in input documents.

Language Modelling Nutrition +1

Rationalizing Text Matching: Learning Sparse Alignments via Optimal Transport

1 code implementation ACL 2020 Kyle Swanson, Lili Yu, Tao Lei

Selecting input features of top relevance has become a popular method for building self-explaining models.

Text Matching

On the evolution of word usage of classical Chinese poetry

no code implementations10 Sep 2015 Liang Liu, Lili Yu

The primary goal of this study is to provide quantitative evidence of the evolutionary linkages, with emphasis on character usage, among different period genres of classical Chinese poetry.

Cannot find the paper you are looking for? You can Submit a new open access paper.