no code implementations • 16 Jan 2025 • Nanye Ma, Shangyuan Tong, HaoLin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, Saining Xie
In this work, we explore the inference-time scaling behavior of diffusion models beyond increasing denoising steps and investigate how the generation performance can further improve with increased computation.
no code implementations • 15 Oct 2024 • Hsin-Ping Huang, Xinyi Wang, Yonatan Bitton, Hagai Taitelbaum, Gaurav Singh Tomar, Ming-Wei Chang, Xuhui Jia, Kelvin C. K. Chan, Hexiang Hu, Yu-Chuan Su, Ming-Hsuan Yang
Using KITTEN, we conduct a systematic study on the fidelity of entities in text-to-image generation models, focusing on their ability to generate a wide range of real-world visual entities, such as landmark buildings, aircraft, plants, and animals.
2 code implementations • 13 Aug 2024 • Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Lluis Castrejon, Kelvin Chan, YiChang Chen, Sander Dieleman, Yuqing Du, Zach Eaton-Rosen, Hongliang Fei, Nando de Freitas, Yilin Gao, Evgeny Gladchenko, Sergio Gómez Colmenarejo, Mandy Guo, Alex Haig, Will Hawkins, Hexiang Hu, Huilian Huang, Tobenna Peter Igwe, Siavash Khodadadeh, Yelin Kim, Ksenia Konyushkova, Karol Langner, Eric Lau, Rory Lawton, Shixin Luo, Soňa Mokrá, Henna Nandwani, Yasumasa Onoe, Aäron van den Oord, Zarana Parekh, Jordi Pont-Tuset, Hang Qi, Rui Qian, Deepak Ramachandran, Poorva Rane, Abdullah Rashwan, Robert Riachi, Hansa Srinivasan, Srivatsan Srinivasan, Robin Strudel, Benigno Uria, Oliver Wang, Su Wang, Austin Waters, Chris Wolff, Auriel Wright, Zhisheng Xiao, Hao Xiong, Keyang Xu, Marc van Zee, Junlin Zhang, Katie Zhang, Wenlei Zhou, Konrad Zolna, Ola Aboubakar, Canfer Akbulut, Oscar Akerlund, Isabela Albuquerque, Nina Anderson, Marco Andreetto, Lora Aroyo, Ben Bariach, David Barker, Sherry Ben, Dana Berman, Courtney Biles, Irina Blok, Pankil Botadra, Jenny Brennan, Karla Brown, John Buckley, Rudy Bunel, Elie Bursztein, Christina Butterfield, Ben Caine, Viral Carpenter, Norman Casagrande, Ming-Wei Chang, Solomon Chang, Shamik Chaudhuri, Tony Chen, John Choi, Dmitry Churbanau, Nathan Clement, Matan Cohen, Forrester Cole, Mikhail Dektiarev, Vincent Du, Praneet Dutta, Tom Eccles, Ndidi Elue, Ashley Feden, Shlomi Fruchter, Frankie Garcia, Roopal Garg, Weina Ge, Ahmed Ghazy, Bryant Gipson, Andrew Goodman, Dawid Górny, Sven Gowal, Khyatti Gupta, Yoni Halpern, Yena Han, Susan Hao, Jamie Hayes, Jonathan Heek, Amir Hertz, Ed Hirst, Emiel Hoogeboom, Tingbo Hou, Heidi Howard, Mohamed Ibrahim, Dirichi Ike-Njoku, Joana Iljazi, Vlad Ionescu, William Isaac, Reena Jana, Gemma Jennings, Donovon Jenson, Xuhui Jia, Kerry Jones, Xiaoen Ju, Ivana Kajic, Christos Kaplanis, Burcu Karagol Ayan, Jacob Kelly, Suraj Kothawade, Christina Kouridi, Ira Ktena, Jolanda Kumakaw, Dana Kurniawan, Dmitry Lagun, Lily Lavitas, Jason Lee, Tao Li, Marco Liang, Maggie Li-Calis, Yuchi Liu, Javier Lopez Alberca, Matthieu Kim Lorrain, Peggy Lu, Kristian Lum, Yukun Ma, Chase Malik, John Mellor, Thomas Mensink, Inbar Mosseri, Tom Murray, Aida Nematzadeh, Paul Nicholas, Signe Nørly, João Gabriel Oliveira, Guillermo Ortiz-Jimenez, Michela Paganini, Tom Le Paine, Roni Paiss, Alicia Parrish, Anne Peckham, Vikas Peswani, Igor Petrovski, Tobias Pfaff, Alex Pirozhenko, Ryan Poplin, Utsav Prabhu, Yuan Qi, Matthew Rahtz, Cyrus Rashtchian, Charvi Rastogi, Amit Raul, Ali Razavi, Sylvestre-Alvise Rebuffi, Susanna Ricco, Felix Riedel, Dirk Robinson, Pankaj Rohatgi, Bill Rosgen, Sarah Rumbley, MoonKyung Ryu, Anthony Salgado, Tim Salimans, Sahil Singla, Florian Schroff, Candice Schumann, Tanmay Shah, Eleni Shaw, Gregory Shaw, Brendan Shillingford, Kaushik Shivakumar, Dennis Shtatnov, Zach Singer, Evgeny Sluzhaev, Valerii Sokolov, Thibault Sottiaux, Florian Stimberg, Brad Stone, David Stutz, Yu-Chuan Su, Eric Tabellion, Shuai Tang, David Tao, Kurt Thomas, Gregory Thornton, Andeep Toor, Cristian Udrescu, Aayush Upadhyay, Cristina Vasconcelos, Alex Vasiloff, Andrey Voynov, Amanda Walker, Luyu Wang, Miaosen Wang, Simon Wang, Stanley Wang, Qifei Wang, Yuxiao Wang, Ágoston Weisz, Olivia Wiles, Chenxia Wu, Xingyu Federico Xu, Andrew Xue, Jianbo Yang, Luo Yu, Mete Yurtoglu, Ali Zand, Han Zhang, Jiageng Zhang, Catherine Zhao, Adilet Zhaxybay, Miao Zhou, Shengqi Zhu, Zhenkai Zhu, Dawn Bloxwich, Mahyar Bordbar, Luis C. Cobo, Eli Collins, Shengyang Dai, Tulsee Doshi, Anca Dragan, Douglas Eck, Demis Hassabis, Sissie Hsiao, Tom Hume, Koray Kavukcuoglu, Helen King, Jack Krawczyk, Yeqing Li, Kathy Meier-Hellstern, Andras Orban, Yury Pinsky, Amar Subramanya, Oriol Vinyals, Ting Yu, Yori Zwols
We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts.
no code implementations • 22 May 2024 • Gwanghyun Kim, Alonso Martinez, Yu-Chuan Su, Brendan Jou, José Lezama, Agrim Gupta, Lijun Yu, Lu Jiang, Aren Jansen, Jacob Walker, Krishna Somandepalli
Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the audiovisual space. Our key contribution lies in how we parameterize the diffusion timestep in the forward diffusion process.
no code implementations • CVPR 2024 • Hexiang Hu, Kelvin C. K. Chan, Yu-Chuan Su, Wenhu Chen, Yandong Li, Kihyuk Sohn, Yang Zhao, Xue Ben, Boqing Gong, William Cohen, Ming-Wei Chang, Xuhui Jia
We introduce *multi-modal instruction* for image generation, a task representation articulating a range of generation intents with precision.
no code implementations • 5 Dec 2023 • Hsin-Ping Huang, Yu-Chuan Su, Deqing Sun, Lu Jiang, Xuhui Jia, Yukun Zhu, Ming-Hsuan Yang
To achieve detailed control, we propose a unified framework to jointly inject control signals into the existing text-to-video model.
no code implementations • ICCV 2023 • Yang Zhao, Tingbo Hou, Yu-Chuan Su, Xuhui Jia. Yandong Li, Matthias Grundmann
An authentic face restoration system is becoming increasingly demanding in many computer vision applications, e. g., image enhancement, video communication, and taking portrait.
no code implementations • 27 Apr 2023 • Kangning Liu, Yu-Chuan Su, Wei, Hong, Ruijin Cang, Xuhui Jia
The one-shot talking-head synthesis task aims to animate a source image to another pose and expression, which is dictated by a driving frame.
no code implementations • 15 Apr 2023 • Hsin-Ping Huang, Yu-Chuan Su, Ming-Hsuan Yang
We tackle the long video generation problem, i. e.~generating videos beyond the output length of video generation models.
no code implementations • 14 Apr 2023 • Yu-Chuan Su, Kelvin C. K. Chan, Yandong Li, Yang Zhao, Han Zhang, Boqing Gong, Huisheng Wang, Xuhui Jia
Our approach greatly reduces the overhead for personalized image generation and is more applicable in many potential applications.
no code implementations • 5 Apr 2023 • Xuhui Jia, Yang Zhao, Kelvin C. K. Chan, Yandong Li, Han Zhang, Boqing Gong, Tingbo Hou, Huisheng Wang, Yu-Chuan Su
This paper proposes a method for generating images of customized objects specified by users.
no code implementations • CVPR 2022 • Yang Zhao, Yu-Chuan Su, Chun-Te Chu, Yandong Li, Marius Renn, Yukun Zhu, Changyou Chen, Xuhui Jia
While existing approaches for face restoration make significant progress in generating high-quality faces, they often fail to preserve facial features and cannot authentically reconstruct the faces.
1 code implementation • 26 Apr 2021 • Yu-Chuan Su, Soravit Changpinyo, Xiangning Chen, Sathish Thoppay, Cho-Jui Hsieh, Lior Shapira, Radu Soricut, Hartwig Adam, Matthew Brown, Ming-Hsuan Yang, Boqing Gong
To enable progress on this task, we create a new dataset consisting of 220k human-annotated 2. 5D relationships among 512K objects from 11K images.
no code implementations • 15 Apr 2021 • Yu-Chuan Su, Raviteja Vemulapalli, Ben Weiss, Chun-Te Chu, Philip Andrew Mansfield, Lior Shapira, Colvin Pitts
To address this issue, we propose a deep learning-based approach that provides suggestions to the photographer on how to adjust the camera view before capturing.
no code implementations • CVPR 2019 • Yu-Chuan Su, Kristen Grauman
KTNs efficiently transfer convolution kernels from perspective images to the equirectangular projection of 360{\deg} images.
no code implementations • CVPR 2018 • Yu-Chuan Su, Kristen Grauman
Standard video encoders developed for conventional narrow field-of-view video are widely applied to 360° video as well, with reasonable results.
no code implementations • 12 Dec 2017 • Yu-Chuan Su, Kristen Grauman
Standard video encoders developed for conventional narrow field-of-view video are widely applied to 360{\deg} video as well, with reasonable results.
no code implementations • NeurIPS 2017 • Yu-Chuan Su, Kristen Grauman
While 360{\deg} cameras offer tremendous new possibilities in vision, graphics, and augmented reality, the spherical images they produce make core feature extraction non-trivial.
no code implementations • CVPR 2017 • Yu-Chuan Su, Kristen Grauman
360deg video requires human viewers to actively control "where" to look while watching the video.
no code implementations • 1 Mar 2017 • Yu-Chuan Su, Kristen Grauman
360$^{\circ}$ video requires human viewers to actively control "where" to look while watching the video.
no code implementations • 7 Dec 2016 • Yu-Chuan Su, Dinesh Jayaraman, Kristen Grauman
AutoCam leverages NFOV web video to discriminatively identify space-time "glimpses" of interest at each time instant, and then uses dynamic programming to select optimal human-like camera trajectories.
no code implementations • 4 Apr 2016 • Yu-Chuan Su, Kristen Grauman
In a wearable camera video, we see what the camera wearer sees.
no code implementations • 1 Apr 2016 • Yu-Chuan Su, Kristen Grauman
Current approaches for activity recognition often ignore constraints on computational resources: 1) they rely on extensive feature computation to obtain rich descriptors on all frames, and 2) they assume batch-mode access to the entire test video at once.
no code implementations • 15 Sep 2014 • Yu-Chuan Su, Tzu-Hsuan Chiu, Chun-Yen Yeh, Hsin-Fu Huang, Winston H. Hsu
The same lack-of-training-sample problem limits the usage of deep models on a wide range of computer vision problems where obtaining training data are difficult.