Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness

apple/axlearn 8 May 2023

In this paper, we discuss two effective approaches to improve the efficiency and robustness of CLIP training: (1) augmenting the training dataset while maintaining the same number of optimization steps, and (2) filtering out samples that contain text regions in the image.

Adversarial Text Retrieval

765
0.97 stars / hour

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

stanfordnlp/pyvene 12 Mar 2024

Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability.

Model Editing

241
0.93 stars / hour

Relaxing Accurate Initialization Constraint for 3D Gaussian Splatting

KU-CVLAB/RAIN-GS 14 Mar 2024

Through extensive analysis of SfM initialization in the frequency domain and analysis of a 1D regression task with multiple 1D Gaussians, we propose a novel optimization strategy dubbed RAIN-GS (Relaxing Accurate Initialization Constraint for 3D Gaussian Splatting), that successfully trains 3D Gaussians from random point clouds.

3D Reconstruction Novel View Synthesis

147
0.86 stars / hour

Score-Guided Diffusion for 3D Human Recovery

statho/scorehmr 14 Mar 2024

We present Score-Guided Human Mesh Recovery (ScoreHMR), an approach for solving inverse problems for 3D human pose and shape reconstruction.

Denoising Human Mesh Recovery

104
0.85 stars / hour

OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

levihsu/ootdiffusion 4 Mar 2024

We present OOTDiffusion, a novel network architecture for realistic and controllable image-based virtual try-on (VTON).

Denoising Image Generation +1

3,717
0.72 stars / hour

GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer

urchade/gliner 14 Nov 2023

Named Entity Recognition (NER) is essential in various Natural Language Processing (NLP) applications.

named-entity-recognition Named Entity Recognition +1

252
0.72 stars / hour

Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study

baai-agents/cradle 5 Mar 2024

Despite the success in specific tasks and scenarios, existing foundation agents, empowered by large models (LMs) and advanced tools, still cannot generalize to different scenarios, mainly due to dramatic differences in the observations and actions across scenarios.

Efficient Exploration

450
0.70 stars / hour

DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

Doubiiu/DynamiCrafter 18 Oct 2023

Animating a still image offers an engaging visual experience.

Image Animation

1,104
0.69 stars / hour

ORPO: Monolithic Preference Optimization without Reference Model

xfactlab/orpo 12 Mar 2024

While recent preference alignment algorithms for language models have demonstrated promising results, supervised fine-tuning (SFT) remains imperative for achieving successful convergence.

74
0.68 stars / hour

Data Interpreter: An LLM Agent For Data Science

geekan/metagpt 28 Feb 2024

Large Language Model (LLM)-based agents have demonstrated remarkable effectiveness.

Language Modelling Large Language Model +1

36,237
0.61 stars / hour