Granite Code Models: A Family of Open Foundation Models for Code Intelligence

ibm-granite/granite-code-models 7 May 2024

Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously.

Code Generation Decoder

KAN: Kolmogorov-Arnold Networks

Blealtan/efficient-kan 30 Apr 2024

Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs).

RLHF Workflow: From Reward Modeling to Online RLHF

rlhflow/online-rlhf 13 May 2024

We present the workflow of Online Iterative Reinforcement Learning from Human Feedback (RLHF) in this technical report, which is widely reported to outperform its offline counterpart by a large margin in the recent large language model (LLM) literature.

Chatbot Language Modelling +1

A Multi-Level Superoptimizer for Tensor Programs

mirage-project/mirage 9 May 2024

We introduce Mirage, the first multi-level superoptimizer for tensor programs.


Improving Diffusion Models for Virtual Try-on

yisol/IDM-VTON 8 Mar 2024

Finally, we present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.

Virtual Try-on

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

microsoft/ms-marco-web-search 13 May 2024

Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals.

Information Retrieval Retrieval

Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models

cohere-ai/magikarp 8 May 2024

The disconnect between tokenizer creation and model training in language models has been known to allow for certain inputs, such as the infamous SolidGoldMagikarp token, to induce unwanted behaviour.

Language Modelling Large Language Model

Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting

ztxtech/Time-Evidence-Fusion-Network 10 May 2024

TEFN is not a model that achieves the ultimate in single aspect, but a model that balances performance, accuracy, stability, and interpretability.

Time Series Time Series Forecasting

Linearizing Large Language Models

tri-ml/linear_open_lm 10 May 2024

Linear transformers have emerged as a subquadratic-time alternative to softmax attention and have garnered significant interest due to their fixed-size recurrent state that lowers inference cost.

In-Context Learning

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

gigaai-research/general-world-models-survey 6 May 2024

General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments to decision-making systems.

Autonomous Driving Decision Making +1

