Long-CLIP: Unlocking the Long-Text Capability of CLIP

beichenzbc/long-clip 22 Mar 2024

Contrastive Language-Image Pre-training (CLIP) has been the cornerstone for zero-shot classification, text-image retrieval, and text-image generation by aligning image and text modalities.

Image Retrieval Language Modelling +3

135
1.09 stars / hour

A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond

qiushisun/ncisurvey 21 Mar 2024

Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence, uncovering new cross-domain opportunities and illustrating the substantial influence of code intelligence across various domains.

99
1.00 stars / hour

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

donydchen/mvsplat 21 Mar 2024

We propose MVSplat, an efficient feed-forward 3D Gaussian Splatting model learned from sparse multi-view images.

3D Reconstruction Generalizable Novel View Synthesis +2

262
0.82 stars / hour

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

williamyang1991/fresco 19 Mar 2024

In this paper, we introduce FRESCO, intra-frame correspondence alongside inter-frame correspondence to establish a more robust spatial-temporal constraint.

Translation valid

507
0.76 stars / hour

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

h-zhao1997/cobra 21 Mar 2024

In recent years, the application of multimodal large language models (MLLM) in various fields has achieved remarkable success.

Language Modelling Large Language Model

87
0.75 stars / hour

LLM4Decompile: Decompiling Binary Code with Large Language Models

albertan017/LLM4Decompile 8 Mar 2024

Therefore, we release the first open-access decompilation LLMs ranging from 1B to 33B pre-trained on 4 billion tokens of C source code and the corresponding assembly code.

2,109
0.66 stars / hour

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

opengvlab/internvideo2 22 Mar 2024

We introduce InternVideo2, a new video foundation model (ViFM) that achieves the state-of-the-art performance in action recognition, video-text tasks, and video-centric dialogue.

 Ranked #1 on Audio Classification on ESC-50 (using extra training data)

Action Classification Action Recognition +12

97
0.66 stars / hour

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

hiyouga/llama-factory 20 Mar 2024

Efficient fine-tuning is vital for adapting large language models (LLMs) to downstream tasks.

Language Modelling Text Generation

14,686
0.64 stars / hour

ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation

hiyouga/llama-efficient-tuning 4 Aug 2023

Applying Reinforcement Learning (RL) to sequence generation models enables the direct optimization of long-term rewards (\textit{e. g.,} BLEU and human feedback), but typically requires large-scale sampling over a space of action sequences.

Abstractive Text Summarization Language Modelling +5

14,702
0.64 stars / hour

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

justimyhxu/grm 21 Mar 2024

We introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from sparse-view images in around 0. 1s.

3D Reconstruction Image to 3D +1

184
0.57 stars / hour