Search Results for author: Junxiong Wang

Found 10 papers, 5 papers with code

M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

1 code implementation14 Apr 2025 Junxiong Wang, Wen-Ding Li, Daniele Paliotta, Daniel Ritter, Alexander M. Rush, Tri Dao

In this paper, we introduce a novel hybrid linear RNN reasoning model, M1, built on the Mamba architecture, which allows memory-efficient inference.

Mamba Math

Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners

no code implementations27 Feb 2025 Daniele Paliotta, Junxiong Wang, Matteo Pagliardini, Kevin Y. Li, Aviv Bick, J. Zico Kolter, Albert Gu, François Fleuret, Tri Dao

Recent advancements have demonstrated that the performance of large language models (LLMs) can be significantly enhanced by scaling computational resources at test time.

Mamba Mathematical Reasoning

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

2 code implementations27 Aug 2024 Junxiong Wang, Daniele Paliotta, Avner May, Alexander M. Rush, Tri Dao

The resulting hybrid model, which incorporates a quarter of the attention layers, achieves performance comparable to the original Transformer in chat benchmarks and outperforms open-source hybrid Mamba models trained from scratch with trillions of tokens in both chat benchmarks and general benchmarks.

GPU Language Modeling +2

Entity Disambiguation via Fusion Entity Decoding

no code implementations2 Apr 2024 Junxiong Wang, Ali Mousavi, Omar Attia, Ronak Pradeep, Saloni Potdar, Alexander M. Rush, Umar Farooq Minhas, Yunyao Li

Existing generative approaches demonstrate improved accuracy compared to classification approaches under the standardized ZELDA benchmark.

 Ranked #1 on Entity Linking on KORE50 (Micro-F1 strong metric)

Decoder Entity Disambiguation +2

MambaByte: Token-free Selective State Space Model

no code implementations24 Jan 2024 Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan, Alexander M. Rush

We propose MambaByte, a token-free adaptation of the Mamba SSM trained autoregressively on byte sequences.

Computational Efficiency Inductive Bias +4

JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning

1 code implementation21 Jul 2023 Kaiwen Wang, Junxiong Wang, Yueying Li, Nathan Kallus, Immanuel Trummer, Wen Sun

Join order selection (JOS) is the problem of ordering join operations to minimize total query execution cost and it is the core NP-hard combinatorial optimization problem of query optimization.

Benchmarking Combinatorial Optimization +4

Pretraining Without Attention

1 code implementation20 Dec 2022 Junxiong Wang, Jing Nathan Yan, Albert Gu, Alexander M. Rush

Even so, BiGS is able to match BERT pretraining accuracy on GLUE and can be extended to long-form pretraining of 4096 tokens without approximation.

State Space Models

Procrastinated Tree Search: Black-box Optimization with Delayed, Noisy, and Multi-Fidelity Feedback

1 code implementation14 Oct 2021 Junxiong Wang, Debabrota Basu, Immanuel Trummer

In black-box optimization problems, we aim to maximize an unknown objective function, where the function is only accessible through feedbacks of an evaluation or simulation oracle.

Cannot find the paper you are looking for? You can Submit a new open access paper.