MotionEditor: Editing Video Motion via Content-Aware Diffusion

Francis-Rings/MotionEditor 30 Nov 2023

This mechanism enables the editing branch to query the key and value from the reconstruction branch in a decoupled manner, making the editing branch retain the original background and protagonist appearance.

Video Editing

81
0.28 stars / hour

Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer

thudm/inf-dit 7 May 2024

However, due to a quadratic increase in memory during generating ultra-high-resolution images (e. g. 4096*4096), the resolution of generated images is often limited to 1024*1024.

Image Generation Super-Resolution

195
0.28 stars / hour

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

state-spaces/mamba 31 May 2024

While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale.

Language Modelling

10,770
0.27 stars / hour

OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving

wzzheng/occsora 30 May 2024

To address this, we propose a diffusion-based 4D occupancy generation model, OccSora, to simulate the development of the 3D world for autonomous driving.

Autonomous Driving Decision Making

58
0.27 stars / hour

Mistral 7B

skypilot-org/skypilot 10 Oct 2023

We introduce Mistral 7B v0. 1, a 7-billion-parameter language model engineered for superior performance and efficiency.

Arithmetic Reasoning Chatbot +9

6,006
0.27 stars / hour

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

alibaba-damo-academy/FunASR 23 Dec 2023

To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.

Self-Supervised Learning Sentiment Analysis +1

4,085
0.27 stars / hour

Faithful Logical Reasoning via Symbolic Chain-of-Thought

aiden0526/symbcot 28 May 2024

Technically, building upon an LLM, SymbCoT 1) first translates the natural language context into the symbolic format, and then 2) derives a step-by-step plan to solve the problem with symbolic logical rules, 3) followed by a verifier to check the translation and reasoning chain.

Logical Reasoning

60
0.27 stars / hour

NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer

zhu-zhiyu/nvs_solver 24 May 2024

By harnessing the potent generative capabilities of pre-trained large video diffusion models, we propose NVS-Solver, a new novel view synthesis (NVS) paradigm that operates \textit{without} the need for training.

Novel View Synthesis

138
0.27 stars / hour

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

wangyuchi369/InstructAvatar 24 May 2024

Recent talking avatar generation models have made strides in achieving realistic and accurate lip synchronization with the audio, but often fall short in controlling and conveying detailed expressions and emotions of the avatar, making the generated video less vivid and controllable.

123
0.27 stars / hour

CARTE: Pretraining and Transfer for Tabular Learning

soda-inria/carte 26 Feb 2024

The architecture -- CARTE for Context Aware Representation of Table Entries -- uses a graph representation of tabular (or relational) data to process tables with different columns, string embedding of entries and columns names to model an open vocabulary, and a graph-attentional network to contextualize entries with column names and neighboring entries.

Data Integration Transfer Learning

13
0.26 stars / hour