Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

qwenlm/qwen-audio 14 Nov 2023

Recently, instruction-following audio-language models have received broad attention for audio interaction with humans.

Instruction Following

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

dvlab-research/llama-vid 28 Nov 2023

Current VLMs, while proficient in tasks like image captioning and visual question answering, face computational burdens when processing long videos due to the excessive visual tokens.

Image Captioning Video-based Generative Performance Benchmarking +2

Adversarial Diffusion Distillation

stability-ai/generative-models 28 Nov 2023

We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1-4 steps while maintaining high image quality.

Image Generation

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

ailab-cvc/unireplknet 27 Nov 2023

1) We propose four architectural guidelines for designing large-kernel ConvNets, the core of which is to exploit the essential characteristics of large kernels that distinguish them from small kernels - they can see wide without going deep.

 Ranked #1 on Object Detection on COCO 2017 (mAP metric)

Image Classification Object Detection +3

Qwen Technical Report

QwenLM/Qwen-7B 28 Sep 2023

Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans.

Language Modelling Large Language Model +1

SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis

ZiqiaoPeng/SyncTalk 29 Nov 2023

A lifelike talking head requires synchronized coordination of subject identity, lip movements, facial expressions, and head poses.

Pair then Relation: Pair-Net for Panoptic Scene Graph Generation

king159/pair-net 17 Jul 2023

Panoptic Scene Graph (PSG) is a challenging task in Scene Graph Generation (SGG) that aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes.

Graph Generation Panoptic Scene Graph Generation +1

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

pku-yuangroup/chat-univi 14 Nov 2023

Large language models have demonstrated impressive universal capabilities across a wide range of open-ended tasks and have extended their utility to encompass multimodal conversations.

Image-based Generative Performance Benchmarking Language Modelling +9

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching

envision-research/luciddreamer 19 Nov 2023

The recent advancements in text-to-3D generation mark a significant milestone in generative models, unlocking new possibilities for creating imaginative 3D assets across various real-world scenarios.

Text to 3D

Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling

lizhe00/animatablegaussians 27 Nov 2023

Overall, our method can create lifelike avatars with dynamic, realistic and generalized appearances.

