Trending Research

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

ibm-granite/granite-code-models • 7 May 2024

Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously.

Code Generation Decoder

716

1.09 stars / hour

Paper
Code

KAN: Kolmogorov-Arnold Networks

Blealtan/efficient-kan • • 30 Apr 2024

Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs).

2,354

1.08 stars / hour

Paper
Code

RLHF Workflow: From Reward Modeling to Online RLHF

rlhflow/online-rlhf • • 13 May 2024

We present the workflow of Online Iterative Reinforcement Learning from Human Feedback (RLHF) in this technical report, which is widely reported to outperform its offline counterpart by a large margin in the recent large language model (LLM) literature.

Chatbot Language Modelling +1

0.93 stars / hour

Paper
Code

A Multi-Level Superoptimizer for Tensor Programs

mirage-project/mirage • 9 May 2024

We introduce Mirage, the first multi-level superoptimizer for tensor programs.

Navigate

222

0.72 stars / hour

Paper
Code

Improving Diffusion Models for Virtual Try-on

yisol/IDM-VTON • • 8 Mar 2024

Finally, we present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.

Ranked #1 on Virtual Try-on on VITON-HD

Virtual Try-on

2,394

0.70 stars / hour

Paper
Code

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

microsoft/ms-marco-web-search • 13 May 2024

Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals.

Information Retrieval Retrieval

227

0.66 stars / hour

Paper
Code

Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models

cohere-ai/magikarp • 8 May 2024

The disconnect between tokenizer creation and model training in language models has been known to allow for certain inputs, such as the infamous SolidGoldMagikarp token, to induce unwanted behaviour.

Language Modelling Large Language Model

0.59 stars / hour

Paper
Code

Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting

ztxtech/Time-Evidence-Fusion-Network • • 10 May 2024

TEFN is not a model that achieves the ultimate in single aspect, but a model that balances performance, accuracy, stability, and interpretability.

Ranked #2 on Time Series Forecasting on ETTm2 (720) Multivariate

Time Series Time Series Forecasting

0.56 stars / hour

Paper
Code

Linearizing Large Language Models

tri-ml/linear_open_lm • • 10 May 2024

Linear transformers have emerged as a subquadratic-time alternative to softmax attention and have garnered significant interest due to their fixed-size recurrent state that lowers inference cost.

In-Context Learning

0.55 stars / hour

Paper
Code

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

gigaai-research/general-world-models-survey • • 6 May 2024

General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments to decision-making systems.

Autonomous Driving Decision Making +1

131

0.54 stars / hour

Paper
Code