3 code implementations • 7 Jan 2025 • Nvidia, :, Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Yunhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman, Pooya Jannaty, Jingyi Jin, Seung Wook Kim, Gergely Klár, Grace Lam, Shiyi Lan, Laura Leal-Taixe, Anqi Li, Zhaoshuo Li, Chen-Hsuan Lin, Tsung-Yi Lin, Huan Ling, Ming-Yu Liu, Xian Liu, Alice Luo, Qianli Ma, Hanzi Mao, Kaichun Mo, Arsalan Mousavian, Seungjun Nah, Sriharsha Niverty, David Page, Despoina Paschalidou, Zeeshan Patel, Lindsey Pavao, Morteza Ramezanali, Fitsum Reda, Xiaowei Ren, Vasanth Rao Naik Sabavat, Ed Schmerling, Stella Shi, Bartosz Stefaniak, Shitao Tang, Lyne Tchapmi, Przemek Tredak, Wei-Cheng Tseng, Jibin Varghese, Hao Wang, Haoxiang Wang, Heng Wang, Ting-Chun Wang, Fangyin Wei, Xinyue Wei, Jay Zhangjie Wu, Jiashu Xu, Wei Yang, Lin Yen-Chen, Xiaohui Zeng, Yu Zeng, Jing Zhang, Qinsheng Zhang, Yuxuan Zhang, Qingqing Zhao, Artur Zolkowski
We position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications.
no code implementations • 12 Dec 2024 • Yue Feng, Vaibhav Sanjay, Spencer Lutz, Badour AlBahar, Songwei Ge, Jia-Bin Huang
Automatically generating multiview illusions is a compelling challenge, where a single piece of visual content offers distinct interpretations from different viewing perspectives.
no code implementations • 13 Jun 2024 • David McAllister, Songwei Ge, Jia-Bin Huang, David W. Jacobs, Alexei A. Efros, Aleksander Holynski, Angjoo Kanazawa
We compare our method to existing approaches for score distillation sampling and show that it can produce high-frequency details with realistic colors.
no code implementations • 6 Jun 2024 • Quynh Phung, Songwei Ge, Jia-Bin Huang
Despite the advances in text-to-image synthesis, particularly with diffusion models, generating visual instructions that require consistent representation and smooth state transitions of objects across sequential steps remains a formidable challenge.
1 code implementation • 18 Apr 2024 • Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar, Jun-Yan Zhu, Jia-Bin Huang
We show that FVD with features extracted from the recent large-scale self-supervised video models is less biased toward image quality.
no code implementations • CVPR 2024 • Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar, Jun-Yan Zhu, Jia-Bin Huang
Frechet Video Distance (FVD) a prominent metric for evaluating video generation models is known to conflict with human perception occasionally.
no code implementations • CVPR 2024 • Quynh Phung, Songwei Ge, Jia-Bin Huang
Driven by the scalable diffusion models trained on large-scale datasets, text-to-image synthesis methods have shown compelling results.
no code implementations • ICCV 2023 • Songwei Ge, Seungjun Nah, Guilin Liu, Tyler Poon, Andrew Tao, Bryan Catanzaro, David Jacobs, Jia-Bin Huang, Ming-Yu Liu, Yogesh Balaji
Despite tremendous progress in generating high-quality images using diffusion models, synthesizing a sequence of animated frames that are both photorealistic and temporally coherent is still in its infancy.
Ranked #7 on
Text-to-Video Generation
on UCF-101
1 code implementation • ICCV 2023 • Songwei Ge, Taesung Park, Jun-Yan Zhu, Jia-Bin Huang
For each region, we enforce its text attributes by creating region-specific detailed prompts and applying region-specific guidance, and maintain its fidelity against plain-text generation through region-based injections.
no code implementations • 16 Feb 2023 • Ting-Hsuan Liao, Songwei Ge, Yiran Xu, Yao-Chih Lee, Badour AlBahar, Jia-Bin Huang
There has been tremendous progress in large-scale text-to-image synthesis driven by diffusion models enabling versatile downstream applications such as 3D object synthesis from texts, image editing, and customized generation.
1 code implementation • CVPR 2023 • Songwei Ge, Shlok Mishra, Simon Kornblith, Chun-Liang Li, David Jacobs
To exploit such a structure, we propose a contrastive learning framework where a Euclidean loss is used to learn object representations and a hyperbolic loss is used to encourage representations of scenes to lie close to representations of their constituent objects in a hyperbolic space.
2 code implementations • 17 Apr 2022 • Thomas Hayes, Songyang Zhang, Xi Yin, Guan Pang, Sasha Sheng, Harry Yang, Songwei Ge, Qiyuan Hu, Devi Parikh
Altogether, MUGEN can help progress research in many tasks in multimodal understanding and generation.
1 code implementation • 7 Apr 2022 • Songwei Ge, Thomas Hayes, Harry Yang, Xi Yin, Guan Pang, David Jacobs, Jia-Bin Huang, Devi Parikh
Videos are created to express emotion, exchange information, and share experiences.
Ranked #22 on
Video Generation
on UCF-101
1 code implementation • NeurIPS 2021 • Songwei Ge, Shlok Mishra, Haohan Wang, Chun-Liang Li, David Jacobs
We also show that model bias favors texture and shape features differently under different test settings.
no code implementations • 27 Jun 2021 • Songwei Ge, Devi Parikh
We ask the question: to what extent can recent large-scale language and image generation models blend visual concepts?
1 code implementation • NeurIPS 2021 • Songwei Ge, Vasu Singla, Ronen Basri, David Jacobs
Using this, we prove that shift invariance in neural networks produces adversarial examples for the simple case of two classes, each consisting of a single image with a black or white dot on a gray background.
1 code implementation • ICLR 2021 • Songwei Ge, Vedanuj Goswami, C. Lawrence Zitnick, Devi Parikh
Sketching or doodling is a popular creative activity that people engage in.
no code implementations • ICLR 2020 • Haohan Wang, Xindi Wu, Songwei Ge, Zachary C. Lipton, Eric P. Xing
Recent research has shown that CNNs are often overly sensitive to high-frequency textural patterns.
no code implementations • 8 Dec 2019 • Austin Dill, Songwei Ge, Eunsu Kang, Chun-Liang Li, Barnabas Poczos
The typical approach for incorporating this creative process is to interpolate in a learned latent space so as to avoid the problem of generating unrealistic instances by exploiting the model's learned structure.
no code implementations • 8 Dec 2019 • Austin Dill, Chun-Liang Li, Songwei Ge, Eunsu Kang
In this work, we explore the idea that effective generative models for point clouds under the autoencoding framework must acknowledge the relationship between a continuous surface, a discretized mesh, and a set of points sampled from the surface.
no code implementations • 20 Aug 2019 • Songwei Ge, Curtis Xuan, Ruihua Song, Chao Zou, Wei Liu, Jin Zhou
In this paper, we address the problem of automatically adding sound effects to radio stories with a retrieval-based model.
no code implementations • 20 Aug 2019 • Songwei Ge, Austin Dill, Eunsu Kang, Chun-Liang Li, Lingyao Zhang, Manzil Zaheer, Barnabas Poczos
We explore the intersection of human and machine creativity by generating sculptural objects through machine learning.
no code implementations • 20 Aug 2019 • Songwei Ge, Zhicheng Dou, Zhengbao Jiang, Jian-Yun Nie, Ji-Rong Wen
Our analysis reveals that the attention model is able to attribute higher weights to more related past sessions after fine training.
4 code implementations • NeurIPS 2019 • Haohan Wang, Songwei Ge, Eric P. Xing, Zachary C. Lipton
Despite their renowned predictive power on i. i. d.
Ranked #114 on
Domain Generalization
on PACS
no code implementations • 13 Nov 2018 • Chun-Liang Li, Eunsu Kang, Songwei Ge, Lingyao Zhang, Austin Dill, Manzil Zaheer, Barnabas Poczos
Our approach extends DeepDream from images to 3D point clouds.