Search Results for author: Willi Menapace

Found 20 papers, 6 papers with code

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

no code implementations17 Jul 2024 Sherwin Bahmani, Ivan Skorokhodov, Aliaksandr Siarohin, Willi Menapace, Guocheng Qian, Michael Vasilkovsky, Hsin-Ying Lee, Chaoyang Wang, Jiaxu Zou, Andrea Tagliasacchi, David B. Lindell, Sergey Tulyakov

Recently, new methods demonstrate the ability to generate videos with controllable camera poses these techniques leverage pre-trained U-Net-based diffusion models that explicitly disentangle spatial and temporal generation.

Video Generation

VIMI: Grounding Video Generation through Multi-modal Instruction

no code implementations8 Jul 2024 Yuwei Fang, Willi Menapace, Aliaksandr Siarohin, Tsai-Shien Chen, Kuan-Chien Wang, Ivan Skorokhodov, Graham Neubig, Sergey Tulyakov

In the first stage, we propose a multimodal conditional video generation framework for pretraining on these augmented datasets, establishing a foundational model for grounded video generation.

Text-to-Video Generation Video Generation +1

Taming Data and Transformers for Audio Generation

no code implementations arXiv 2024 Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin, Guha Balakrishnan, Sergey Tulyakov, Vicente Ordonez

Generating ambient sounds and effects is a challenging problem due to data scarcity and often insufficient caption quality, making it difficult to employ large-scale generative models for the task.

Ranked #2 on Audio captioning on AudioCaps (using extra training data)

Audio captioning Audio Generation +2

SF-V: Single Forward Video Generation Model

no code implementations6 Jun 2024 Zhixing Zhang, Yanyu Li, Yushu Wu, Yanwu Xu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Dimitris Metaxas, Sergey Tulyakov, Jian Ren

Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process.

Denoising Video Generation

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

1 code implementation CVPR 2024 Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Ekaterina Deyneka, Hsiang-wei Chao, Byung Eun Jeon, Yuwei Fang, Hsin-Ying Lee, Jian Ren, Ming-Hsuan Yang, Sergey Tulyakov

Next, we finetune a retrieval model on a small subset where the best caption of each video is manually selected and then employ the model in the whole dataset to select the best caption as the annotation.

Text Retrieval Video Captioning +2

Delving into CLIP latent space for Video Anomaly Recognition

1 code implementation4 Oct 2023 Luca Zanella, Benedetta Liberatori, Willi Menapace, Fabio Poiesi, Yiming Wang, Elisa Ricci

We tackle the complex problem of detecting and recognising anomalies in surveillance videos at the frame level, utilising only video-level supervision.

Anomaly Detection Multiple Instance Learning +1

Interactive Neural Painting

no code implementations31 Jul 2023 Elia Peruzzo, Willi Menapace, Vidit Goel, Federica Arrigoni, Hao Tang, Xingqian Xu, Arman Chopikyan, Nikita Orlov, Yuxiao Hu, Humphrey Shi, Nicu Sebe, Elisa Ricci

This paper advances the state of the art in this emerging research domain by proposing the first approach for Interactive NP.

Decoder

Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion Models

no code implementations23 Mar 2023 Willi Menapace, Aliaksandr Siarohin, Stéphane Lathuilière, Panos Achlioptas, Vladislav Golyanik, Sergey Tulyakov, Elisa Ricci

Most captivatingly, our PGM unlocks the director's mode, where the game is played by specifying goals for the agents in the form of a prompt.

Navigate

InfiniCity: Infinite-Scale City Synthesis

no code implementations ICCV 2023 Chieh Hubert Lin, Hsin-Ying Lee, Willi Menapace, Menglei Chai, Aliaksandr Siarohin, Ming-Hsuan Yang, Sergey Tulyakov

Toward infinite-scale 3D city synthesis, we propose a novel framework, InfiniCity, which constructs and renders an unconstrainedly large and 3D-grounded environment from random noises.

Image Generation Neural Rendering

Quantum Motion Segmentation

no code implementations24 Mar 2022 Federica Arrigoni, Willi Menapace, Marcel Seelbach Benkner, Elisa Ricci, Vladislav Golyanik

Motion segmentation is a challenging problem that seeks to identify independent motions in two or several input images.

Motion Segmentation Segmentation

Learning to Cluster under Domain Shift

1 code implementation ECCV 2020 Willi Menapace, Stéphane Lathuilière, Elisa Ricci

While unsupervised domain adaptation methods based on deep architectures have achieved remarkable success in many computer vision tasks, they rely on a strong assumption, i. e. labeled source data must be available.

Clustering Deep Clustering +1

Cannot find the paper you are looking for? You can Submit a new open access paper.