Search Results for author: Jing Nathan Yan

Found 6 papers, 2 papers with code

MambaByte: Token-free Selective State Space Model

no code implementations • 24 Jan 2024 • Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan, Alexander M. Rush

We propose MambaByte, a token-free adaptation of the Mamba SSM trained autoregressively on byte sequences.

Computational Efficiency Inductive Bias +1

Paper
Add Code

Diffusion Models Without Attention

no code implementations • 30 Nov 2023 • Jing Nathan Yan, Jiatao Gu, Alexander M. Rush

In recent advancements in high-fidelity image generation, Denoising Diffusion Probabilistic Models (DDPMs) have emerged as a key player.

Denoising Image Generation

Paper
Add Code

On What Basis? Predicting Text Preference Via Structured Comparative Reasoning

no code implementations • 14 Nov 2023 • Jing Nathan Yan, Tianqi Liu, Justin T Chiu, Jiaming Shen, Zhen Qin, Yue Yu, Yao Zhao, Charu Lakshmanan, Yair Kurzion, Alexander M. Rush, Jialu Liu, Michael Bendersky

Comparative reasoning plays a crucial role in text preference prediction; however, large language models (LLMs) often demonstrate inconsistencies in their reasoning.

Hallucination Retrieval

Paper
Add Code

Explanation-aware Soft Ensemble Empowers Large Language Model In-context Learning

no code implementations • 13 Nov 2023 • Yue Yu, Jiaming Shen, Tianqi Liu, Zhen Qin, Jing Nathan Yan, Jialu Liu, Chao Zhang, Michael Bendersky

To fully unleash the power of explanations, we propose EASE, an Explanation-Aware Soft Ensemble framework to empower in-context learning with LLMs.

In-Context Learning Language Modelling +2

Paper
Add Code

TimelineQA: A Benchmark for Question Answering over Timelines

1 code implementation • 1 Jun 2023 • Wang-Chiew Tan, Jane Dwivedi-Yu, Yuliang Li, Lambert Mathias, Marzieh Saeidi, Jing Nathan Yan, Alon Y. Halevy

We describe a set of experiments on TimelineQA with several state-of-the-art QA models.

Question Answering Retrieval

Paper
Code

Pretraining Without Attention

1 code implementation • 20 Dec 2022 • Junxiong Wang, Jing Nathan Yan, Albert Gu, Alexander M. Rush

Even so, BiGS is able to match BERT pretraining accuracy on GLUE and can be extended to long-form pretraining of 4096 tokens without approximation.

100

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.