Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Leeroo-AI/mergoo 12 Mar 2024

We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge.

Arithmetic Reasoning Code Generation +6

176
0.87 stars / hour

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

picsart-ai-research/streamingt2v 21 Mar 2024

To overcome these limitations, we introduce StreamingT2V, an autoregressive approach for long video generation of 80, 240, 600, 1200 or more frames with smooth transitions.

Text-to-Video Generation Video Generation

768
0.84 stars / hour

AutoCodeRover: Autonomous Program Improvement

nus-apr/auto-code-rover 8 Apr 2024

Recent progress in Large Language Models (LLMs) has significantly impacted the development process, where developers can use LLM-based programming assistants to achieve automated coding.

Bug fixing Code Search +1

1,893
0.76 stars / hour

SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap

soccernet/sn-gamestate 17 Apr 2024

This tracking and identification process is crucial for reconstructing the game state, defined by the athletes' positions and identities on a 2D top-view of the pitch, (i. e. a minimap).

Camera Calibration

41
0.75 stars / hour

InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation

instantstyle/instantstyle 3 Apr 2024

Tuning-free diffusion-based models have demonstrated significant potential in the realm of image personalization and customization.

Text-to-Image Generation

1,032
0.71 stars / hour

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

microsoft/mechanistic-error-probe 26 Sep 2023

We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text.

20
0.69 stars / hour

OneChart: Purify the Chart Structural Extraction via One Auxiliary Token

lingyvkong/onechart 15 Apr 2024

To address this, we propose OneChart: a reliable agent specifically devised for the structural extraction of chart information.

37
0.68 stars / hour

Robust Speech Recognition via Large-Scale Weak Supervision

ggerganov/whisper.cpp Preprint 2022

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.

 Ranked #1 on Speech Recognition on Common Voice Italian (using extra training data)

Robust Speech Recognition speech-recognition

30,903
0.68 stars / hour

ChangeMamba: Remote Sensing Change Detection with Spatio-Temporal State Space Model

chenhongruixuan/mambacd 4 Apr 2024

For the change decoder, which is available in all three architectures, we propose three spatio-temporal relationship modeling mechanisms, which can be naturally combined with the Mamba architecture and fully utilize its attribute to achieve spatio-temporal interaction of multi-temporal features, thereby obtaining accurate change information.

2D Semantic Segmentation Attribute +1

152
0.61 stars / hour

TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting

ibm/tsfm 14 Jun 2023

TSMixer outperforms state-of-the-art MLP and Transformer models in forecasting by a considerable margin of 8-60%.

Multivariate Time Series Forecasting Representation Learning +2

148
0.57 stars / hour