Trending Research

MotionEditor: Editing Video Motion via Content-Aware Diffusion

Francis-Rings/MotionEditor • • 30 Nov 2023

This mechanism enables the editing branch to query the key and value from the reconstruction branch in a decoupled manner, making the editing branch retain the original background and protagonist appearance.

Video Editing

0.28 stars / hour

Paper
Code

Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer

thudm/inf-dit • • 7 May 2024

However, due to a quadratic increase in memory during generating ultra-high-resolution images (e. g. 4096*4096), the resolution of generated images is often limited to 1024*1024.

Image Generation Super-Resolution

195

0.28 stars / hour

Paper
Code

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

state-spaces/mamba • • 31 May 2024

While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale.

Language Modelling

10,770

0.27 stars / hour

Paper
Code

OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving

wzzheng/occsora • • 30 May 2024

To address this, we propose a diffusion-based 4D occupancy generation model, OccSora, to simulate the development of the 3D world for autonomous driving.

Autonomous Driving Decision Making

0.27 stars / hour

Paper
Code

Mistral 7B

skypilot-org/skypilot • • 10 Oct 2023

We introduce Mistral 7B v0. 1, a 7-billion-parameter language model engineered for superior performance and efficiency.

Ranked #4 on Zero-Shot Video Question Answer on NExT-GQA

Arithmetic Reasoning Chatbot +9

6,006

0.27 stars / hour

Paper
Code

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

alibaba-damo-academy/FunASR • • 23 Dec 2023

To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.

Self-Supervised Learning Sentiment Analysis +1

4,085

0.27 stars / hour

Paper
Code

Faithful Logical Reasoning via Symbolic Chain-of-Thought

aiden0526/symbcot • 28 May 2024

Technically, building upon an LLM, SymbCoT 1) first translates the natural language context into the symbolic format, and then 2) derives a step-by-step plan to solve the problem with symbolic logical rules, 3) followed by a verifier to check the translation and reasoning chain.

Logical Reasoning

0.27 stars / hour

Paper
Code

NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer

zhu-zhiyu/nvs_solver • • 24 May 2024

By harnessing the potent generative capabilities of pre-trained large video diffusion models, we propose NVS-Solver, a new novel view synthesis (NVS) paradigm that operates \textit{without} the need for training.

Novel View Synthesis

138

0.27 stars / hour

Paper
Code

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

wangyuchi369/InstructAvatar • 24 May 2024

Recent talking avatar generation models have made strides in achieving realistic and accurate lip synchronization with the audio, but often fall short in controlling and conveying detailed expressions and emotions of the avatar, making the generated video less vivid and controllable.

123

0.27 stars / hour

Paper
Code

CARTE: Pretraining and Transfer for Tabular Learning

soda-inria/carte • • 26 Feb 2024

The architecture -- CARTE for Context Aware Representation of Table Entries -- uses a graph representation of tabular (or relational) data to process tables with different columns, string embedding of entries and columns names to model an open vocabulary, and a graph-attentional network to contextualize entries with column names and neighboring entries.

Data Integration Transfer Learning

0.26 stars / hour

Paper
Code