CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

thudm/cogvideo 12 Aug 2024

To improve the text-video alignment, we propose an expert transformer with the expert adaptive LayerNorm to facilitate the deep fusion between the two modalities.

Text-to-Video Generation Video Alignment +1

7,071
0.40 stars / hour

MemLong: Memory-Augmented Retrieval for Long Text Modeling

bui1dmysea/memlong 30 Aug 2024

This work introduces MemLong: Memory-Augmented Retrieval for Long Text Generation, a method designed to enhance the capabilities of long-context language modeling by utilizing an external retriever for historical information retrieval.

4k Decoder +4

49
0.39 stars / hour

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

showlab/show-o 22 Aug 2024

We present a unified transformer, i. e., Show-o, that unifies multimodal understanding and generation.

Question Answering Text-to-Image Generation +1

748
0.39 stars / hour

FAST-LIVO2: Fast, Direct LiDAR-Inertial-Visual Odometry

hku-mars/fast-livo2 26 Aug 2024

The fusion of both visual and LiDAR measurements is based on a single unified voxel map where the LiDAR module constructs the geometric structure for registering new LiDAR scans and the visual module attaches image patches to the LiDAR points.

Visual Odometry

657
0.38 stars / hour

OxfordVGG Submission to the EGO4D AV Transcription Challenge

m-bain/whisperx 18 Jul 2023

This report presents the technical details of our submission on the EGO4D Audio-Visual (AV) Automatic Speech Recognition Challenge 2023 from the OxfordVGG team.

Automatic Speech Recognition speech-recognition +1

11,323
0.37 stars / hour

Open-Set Biometrics: Beyond Good Closed-Set Models

Recognito-Vision/Linux-FaceRecognition-FaceLivenessDetection 23 Jul 2024

Biometric recognition has primarily addressed closed-set identification, assuming all probe subjects are in the gallery.

Face Recognition Gait Recognition +1

774
0.36 stars / hour

Artificial Immune System of Secure Face Recognition Against Adversarial Attacks

Recognito-Vision/Face-SDK-Linux-Demos 26 Jun 2024

Insect production for food and feed presents a promising supplement to ensure food safety and address the adverse impacts of agriculture on climate and environment in the future.

Diversity Face Recognition

774
0.36 stars / hour

Explainable Person Re-Identification with Attribute-guided Metric Distillation

SheldongChen/AMD.github.io ICCV 2021

In this paper, we propose a post-hoc method, named Attribute-guided Metric Distillation (AMD), to explain existing ReID models.

Attribute Person Re-Identification

117
0.35 stars / hour

DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving

PJLab-ADG/DriveArena 1 Aug 2024

This paper presented DriveArena, the first high-fidelity closed-loop simulation system designed for driving agents navigating in real scenarios.

166
0.34 stars / hour