Isomorphic Pruning for Vision Models

vainf/isomorphic-pruning 5 Jul 2024

For instance, we improve the accuracy of DeiT-Tiny from 74. 52% to 77. 50% by pruning an off-the-shelf DeiT-Base model.

24
0.63 stars / hour

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

test-time-training/ttt-lm-jax 5 Jul 2024

We evaluate our instantiations at the scale of 125M to 1. 3B parameters, comparing with a strong Transformer and Mamba, a modern RNN.

16k 8k +1

204
0.62 stars / hour

Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image

AiuniAI/Unique3D 30 May 2024

In this work, we introduce Unique3D, a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images, featuring state-of-the-art generation fidelity and strong generalizability.

Image to 3D Single-View 3D Reconstruction +1

2,329
0.58 stars / hour

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds

open-mmlab/foleycrafter 1 Jul 2024

Meanwhile, the temporal controller incorporates an onset detector and a timestampbased adapter to achieve precise audio-video alignment.

Audio Generation Video Alignment +1

187
0.51 stars / hour

Model Predictive Optimized Path Integral Strategies

iit-dlslab/quadruped-pympc 30 Mar 2022

We generalize the derivation of model predictive path integral control (MPPI) to allow for a single joint distribution across controls in the control sequence.

136
0.50 stars / hour

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models

InternLM/InternLM-XComposer 22 Feb 2024

We present DualFocus, a novel framework for integrating macro and micro perspectives within multi-modal large language models (MLLMs) to enhance vision-language task performance.

Hallucination

2,199
0.47 stars / hour
2,199
0.45 stars / hour

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

InternLM/InternLM-XComposer 21 Nov 2023

In the realm of large multi-modal models (LMMs), efficient modality alignment is crucial yet often constrained by the scarcity of high-quality image-text data.

Descriptive visual instruction following +2

2,199
0.44 stars / hour

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

buaacyw/meshanything 14 Jun 2024

Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement.

Decoder

1,663
0.44 stars / hour

BM25S: Orders of magnitude faster lexical search via eager sparse scoring

xhluca/bm25s 4 Jul 2024

We introduce BM25S, an efficient Python-based implementation of BM25 that only depends on Numpy and Scipy.

Passage Retrieval Text Retrieval +1

616
0.44 stars / hour