no code implementations • 26 May 2025 • Jing Nathan Yan, Junxiong Wang, Jeffrey M. Rzeszotarski, Allison Koenecke
The rapid proliferation of recommender systems necessitates robust fairness practices to address inherent biases.
1 code implementation • 14 Apr 2025 • Junxiong Wang, Wen-Ding Li, Daniele Paliotta, Daniel Ritter, Alexander M. Rush, Tri Dao
In this paper, we introduce a novel hybrid linear RNN reasoning model, M1, built on the Mamba architecture, which allows memory-efficient inference.
no code implementations • 27 Feb 2025 • Daniele Paliotta, Junxiong Wang, Matteo Pagliardini, Kevin Y. Li, Aviv Bick, J. Zico Kolter, Albert Gu, François Fleuret, Tri Dao
Recent advancements have demonstrated that the performance of large language models (LLMs) can be significantly enhanced by scaling computational resources at test time.
2 code implementations • 27 Aug 2024 • Junxiong Wang, Daniele Paliotta, Avner May, Alexander M. Rush, Tri Dao
The resulting hybrid model, which incorporates a quarter of the attention layers, achieves performance comparable to the original Transformer in chat benchmarks and outperforms open-source hybrid Mamba models trained from scratch with trillions of tokens in both chat benchmarks and general benchmarks.
no code implementations • 2 Apr 2024 • Junxiong Wang, Ali Mousavi, Omar Attia, Ronak Pradeep, Saloni Potdar, Alexander M. Rush, Umar Farooq Minhas, Yunyao Li
Existing generative approaches demonstrate improved accuracy compared to classification approaches under the standardized ZELDA benchmark.
Ranked #1 on
Entity Linking
on KORE50
(Micro-F1 strong metric)
no code implementations • 24 Jan 2024 • Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan, Alexander M. Rush
We propose MambaByte, a token-free adaptation of the Mamba SSM trained autoregressively on byte sequences.
1 code implementation • 21 Jul 2023 • Kaiwen Wang, Junxiong Wang, Yueying Li, Nathan Kallus, Immanuel Trummer, Wen Sun
Join order selection (JOS) is the problem of ordering join operations to minimize total query execution cost and it is the core NP-hard combinatorial optimization problem of query optimization.
1 code implementation • 20 Dec 2022 • Junxiong Wang, Jing Nathan Yan, Albert Gu, Alexander M. Rush
Even so, BiGS is able to match BERT pretraining accuracy on GLUE and can be extended to long-form pretraining of 4096 tokens without approximation.
1 code implementation • 14 Oct 2021 • Junxiong Wang, Debabrota Basu, Immanuel Trummer
In black-box optimization problems, we aim to maximize an unknown objective function, where the function is only accessible through feedbacks of an evaluation or simulation oracle.
no code implementations • 24 Nov 2014 • Junxiong Wang, Hongzhi Wang, Chenxu Zhao
Currently, many machine learning algorithms contain lots of iterations.