no code implementations • 19 Dec 2024 • Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Alper Canberk, Kwot Sin Lee, Vicente Ordonez, Sergey Tulyakov
We propose AV-Link, a unified framework for Video-to-Audio and Audio-to-Video generation that leverages the activations of frozen video and audio diffusion models for temporally-aligned cross-modal conditioning.
no code implementations • 17 Oct 2024 • Ruoshi Liu, Alper Canberk, Shuran Song, Carl Vondrick
Vision foundation models trained on massive amounts of visual data have shown unprecedented reasoning and planning skills in open-world settings.
no code implementations • 31 Aug 2024 • Alper Canberk, Maksym Bondarenko, Ege Ozguroglu, Ruoshi Liu, Carl Vondrick
With this scalable automatic data generation pipeline, we can create a dataset for learning object insertion, which is used to train our proposed text conditioned diffusion model.