Trending Research

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

Costwen/Ouroboros3D • 5 Jun 2024

However, training these two stages separately leads to significant data bias in the inference phase, thus affecting the quality of reconstructed results.

3D Generation 3D Reconstruction +2

0.48 stars / hour

Paper
Code

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

openbmb/minicpm-v • • 18 Mar 2024

To address the challenges, we present LLaVA-UHD, a large multimodal model that can efficiently perceive images in any aspect ratio and high resolution.

7,114

0.46 stars / hour

Paper
Code

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

vikparuchuri/marker • • 11 Jan 2021

We design models based off T5-Base and T5-Large to obtain up to 7x increases in pre-training speed with the same computational resources.

Language Modelling Question Answering

12,422

0.45 stars / hour

Paper
Code

Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

SALT-NLP/demonstrated-feedback • • 2 Jun 2024

Across our benchmarks and user study, we find that win-rates for DITTO outperform few-shot prompting, supervised fine-tuning, and other self-play methods by an average of 19% points.

Imitation Learning Language Modelling

0.44 stars / hour

Paper
Code

Improving Alignment and Robustness with Circuit Breakers

blackswan-ai/short-circuiting • 6 Jun 2024

Existing techniques aimed at improving alignment, such as refusal training, are often bypassed.

Adversarial Robustness

0.43 stars / hour

Paper
Code

Generalizable Human Gaussians from Single-View Image

jinnan-chen/HGM • 10 Jun 2024

To this end, we propose single-view generalizable Human Gaussian model (HGM), a diffusion-guided framework for 3D human modeling from a single image.

SSIM

0.42 stars / hour

Paper
Code

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

richard-peng-xia/cares • 10 Jun 2024

Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare.

Fairness

0.42 stars / hour

Paper
Code

L-MAGIC: Language Model Assisted Generation of Images with Coherence

intellabs/mmpano • • CVPR 2024

However, the lack of global scene layout priors leads to subpar outputs with duplicated objects (e. g., multiple beds in a bedroom) or requires time-consuming human text inputs for each view.

Depth Estimation Language Modelling +2

0.41 stars / hour

Paper
Code

ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization

explainableml/reno • • 6 Jun 2024

Moreover, given the same computational resources, a ReNO-optimized one-step model outperforms widely-used open-source models such as SDXL and PixArt-$\alpha$, highlighting the efficiency and effectiveness of ReNO in enhancing T2I model performance at inference time.

0.41 stars / hour

Paper
Code

Pre-training Small Base LMs with Fewer Tokens

Lightning-AI/lit-gpt • • 12 Apr 2024

Here we show that smaller LMs trained utilizing some of the layers of GPT2-medium (355M) and GPT-2-large (770M) can effectively match the val loss of their bigger counterparts when trained from scratch for the same number of training steps on OpenWebText dataset with 9B tokens.

Language Modelling

7,835

0.41 stars / hour

Paper
Code