DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding

idea-research/dino-x-api 21 Nov 2024

DINO-X employs the same Transformer-based encoder-decoder architecture as Grounding DINO 1. 5 to pursue an object-level representation for open-world object understanding.

Long-tailed Object Detection Object +4

402
0.78 stars / hour

3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes

convexsplatting/convex-splatting 22 Nov 2024

Our results highlight the potential of 3D Convex Splatting to become the new standard for high-quality scene reconstruction and novel view synthesis.

Novel View Synthesis

158
0.77 stars / hour

DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting

chenhoy/droid-splat 26 Nov 2024

Recent progress in scene synthesis makes standalone SLAM systems purely based on optimizing hyperprimitives with a Rendering objective possible.

Camera Calibration Depth Estimation +1

164
0.73 stars / hour

SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting

lllyasviel/ic-light CVPR 2024

We introduce a co-designed approach for human portrait relighting that combines a physics-guided architecture with a pre-training framework.

6,396
0.68 stars / hour

Cautious Optimizers: Improving Training with One Line of Code

kyleliang919/c-optim 25 Nov 2024

In this work, we propose a \textbf{single-line modification in Pytorch} to any momentum-based optimizer, which we rename Cautious Optimizer, e. g. C-AdamW and C-Lion.

120
0.63 stars / hour

FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations

hmrishavbandy/flipsketch 16 Nov 2024

Sketch animations offer a powerful medium for visual storytelling, from simple flip-book doodles to professional studio productions.

Visual Storytelling

196
0.63 stars / hour

MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated Single-Head Transformer with Convolution-Augmented Joint Self-Attentions

modelscope/ClearerVoice-Studio 23 Feb 2023

To effectively solve the indirect elemental interactions across chunks in the dual-path architecture, MossFormer employs a joint local and global self-attention architecture that simultaneously performs a full-computation self-attention on local chunks and a linearised low-cost self-attention over the full sequence.

 Ranked #1 on Speech Separation on WSJ0-2mix-16k (using extra training data)

Speech Separation

182
0.61 stars / hour

MARS: Unleashing the Power of Variance Reduction for Training Large Models

AGI-Arena/MARS 15 Nov 2024

Despite the development of numerous variance reduction algorithms in the past decade aimed at accelerating stochastic optimization in both convex and nonconvex settings, variance reduction has not found widespread success in training deep neural networks or large language models.

Stochastic Optimization

284
0.60 stars / hour

Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

allenai/open-instruct 13 Jun 2024

High-quality preference data leads to improvements of up to 8% in instruction following and truthfulness.

Instruction Following Math

1,951
0.58 stars / hour

Streaming Deep Reinforcement Learning Finally Works

mohmdelsayed/streaming-drl 18 Oct 2024

This paper introduces the stream-x algorithms, the first class of deep RL algorithms to overcome stream barrier for both prediction and control and match sample efficiency of batch RL.

Atari Games Deep Reinforcement Learning +3

142
0.58 stars / hour