VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

mbzuai-oryx/videogpt-plus 13 Jun 2024

Building on the advances of language models, Large Multimodal Models (LMMs) have contributed significant improvements in video understanding.

Dense Video Captioning VCGBench-Diverse +7

On the Measure of Intelligence

fchollet/ARC 5 Nov 2019

To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans.


OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

opengvlab/omnicorpus 12 Jun 2024

In this paper, we introduce OmniCorpus, a 10 billion-scale image-text interleaved dataset.

In-Context Learning

ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code

gersteinlab/ml-bench 16 Nov 2023

Despite Large Language Models (LLMs) like GPT-4 achieving impressive results in function-level code generation, they struggle with repository-scale code understanding (e. g., coming up with the right arguments for calling routines), requiring a deeper comprehension of complex file interactions.

Code Generation Navigate

CleanDiffuser: An Easy-to-use Modularized Library for Diffusion Models in Decision Making

cleandiffuserteam/cleandiffuser 13 Jun 2024

By revisiting the roles of DMs in the decision-making domain, we identify a set of essential sub-modules that constitute the core of CleanDiffuser, allowing for the implementation of various DM algorithms with simple and flexible building blocks.

Decision Making

MegActor: Harness the Power of Raw Video for Vivid Portrait Animation

megvii-research/megfaceanimate 31 May 2024

Despite raw driving videos contain richer information on facial expressions than intermediate representations such as landmarks in the field of portrait animation, they are seldom the subject of research.

Style Transfer Synthetic Data Generation

AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation

donahowe/AutoStudio 3 Jun 2024

As cutting-edge Text-to-Image (T2I) generation models already excel at producing remarkable single images, an even more challenging task, i. e., multi-turn interactive image generation begins to attract the attention of related research communities.

Image Generation

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

line/libritts-p 12 Jun 2024

We employ a hybrid approach to construct prompt annotations: (1) manual annotations that capture human perceptions of speaker characteristics and (2) synthetic annotations on speaking style.

ProG: A Graph Prompt Learning Benchmark

sheldonresearch/ProG 8 Jun 2024

Artificial general intelligence on graphs has shown significant advancements across various applications, yet the traditional 'Pre-train & Fine-tune' paradigm faces inefficiencies and negative transfer issues, particularly in complex and few-shot settings.

Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image

AiuniAI/Unique3D 30 May 2024

In this work, we introduce Unique3D, a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images, featuring state-of-the-art generation fidelity and strong generalizability.

Image to 3D Single-View 3D Reconstruction +1

