no code implementations • 4 Jun 2025 • Hermann Kumbong, Xian Liu, Tsung-Yi Lin, Ming-Yu Liu, Xihui Liu, Ziwei Liu, Daniel Y. Fu, Christopher Ré, David W. Romero
During inference, an image is generated by predicting all the tokens in the next (higher-resolution) scale, conditioned on all tokens in all previous (lower-resolution) scales.
no code implementations • 23 Apr 2025 • Yujie Qin, Ming He, Changyong Yu, Ming Ni, Xian Liu, Xiaochen Bo
The de novo design of proteins refers to creating proteins with specific structures and functions that do not naturally exist.
1 code implementation • 17 Apr 2025 • Kaiyue Sun, Xian Liu, Yao Teng, Xihui Liu
Personalized image synthesis has emerged as a pivotal application in text-to-image generation, enabling the creation of images featuring specific subjects in diverse contexts.
2 code implementations • 18 Mar 2025 • Nvidia, :, Hassan Abu Alhaija, Jose Alvarez, Maciej Bala, Tiffany Cai, Tianshi Cao, Liz Cha, Joshua Chen, Mike Chen, Francesco Ferroni, Sanja Fidler, Dieter Fox, Yunhao Ge, Jinwei Gu, Ali Hassani, Michael Isaev, Pooya Jannaty, Shiyi Lan, Tobias Lasser, Huan Ling, Ming-Yu Liu, Xian Liu, Yifan Lu, Alice Luo, Qianli Ma, Hanzi Mao, Fabio Ramos, Xuanchi Ren, Tianchang Shen, Xinglong Sun, Shitao Tang, Ting-Chun Wang, Jay Wu, Jiashu Xu, Stella Xu, Kevin Xie, Yuchong Ye, Xiaodong Yang, Xiaohui Zeng, Yu Zeng
We introduce Cosmos-Transfer, a conditional world generation model that can generate world simulations based on multiple spatial control inputs of various modalities such as segmentation, depth, and edge.
1 code implementation • 6 Mar 2025 • Yuqi Hu, Longguang Wang, Xian Liu, Ling-Hao Chen, Yuwei Guo, Yukai Shi, Ce Liu, Anyi Rao, Zeyu Wang, Hui Xiong
Understanding and replicating the real world is a critical challenge in Artificial General Intelligence (AGI) research.
4 code implementations • 7 Jan 2025 • Nvidia, :, Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Yunhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman, Pooya Jannaty, Jingyi Jin, Seung Wook Kim, Gergely Klár, Grace Lam, Shiyi Lan, Laura Leal-Taixe, Anqi Li, Zhaoshuo Li, Chen-Hsuan Lin, Tsung-Yi Lin, Huan Ling, Ming-Yu Liu, Xian Liu, Alice Luo, Qianli Ma, Hanzi Mao, Kaichun Mo, Arsalan Mousavian, Seungjun Nah, Sriharsha Niverty, David Page, Despoina Paschalidou, Zeeshan Patel, Lindsey Pavao, Morteza Ramezanali, Fitsum Reda, Xiaowei Ren, Vasanth Rao Naik Sabavat, Ed Schmerling, Stella Shi, Bartosz Stefaniak, Shitao Tang, Lyne Tchapmi, Przemek Tredak, Wei-Cheng Tseng, Jibin Varghese, Hao Wang, Haoxiang Wang, Heng Wang, Ting-Chun Wang, Fangyin Wei, Xinyue Wei, Jay Zhangjie Wu, Jiashu Xu, Wei Yang, Lin Yen-Chen, Xiaohui Zeng, Yu Zeng, Jing Zhang, Qinsheng Zhang, Yuxuan Zhang, Qingqing Zhao, Artur Zolkowski
In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups.
no code implementations • CVPR 2025 • Hermann Kumbong, Xian Liu, Tsung-Yi Lin, Ming-Yu Liu, Xihui Liu, Ziwei Liu, Daniel Y. Fu, Christopher Re, David W. Romero
HMAR reformulates next-scale prediction as a Markovian process, wherein prediction of each resolution scale is conditioned only on tokens in its immediate predecessor instead of the tokens in all predecessor resolutions.
no code implementations • 10 Dec 2024 • Xiao Fu, Xian Liu, Xintao Wang, Sida Peng, Menghan Xia, Xiaoyu Shi, Ziyang Yuan, Pengfei Wan, Di Zhang, Dahua Lin
Previous methods on controllable video generation primarily leverage 2D control signals to manipulate object motions and have achieved remarkable synthesis results.
2 code implementations • 2 Oct 2024 • Yao Teng, Han Shi, Xian Liu, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li, Xihui Liu
In this paper, we propose a training-free probabilistic parallel decoding algorithm, Speculative Jacobi Decoding (SJD), to accelerate auto-regressive text-to-image generation.
no code implementations • 26 Sep 2024 • Jiaxiang Tang, Zhaoshuo Li, Zekun Hao, Xian Liu, Gang Zeng, Ming-Yu Liu, Qinsheng Zhang
Current auto-regressive mesh generation methods suffer from issues such as incompleteness, insufficient detail, and poor generalization.
1 code implementation • 30 Jul 2024 • Yuxuan Bian, Ailing Zeng, Xuan Ju, Xian Liu, Zhaoyang Zhang, Wei Liu, Qiang Xu
However, employing a unified model to achieve various generation tasks with different condition modalities presents two main challenges: motion distribution drifts across different tasks (e. g., co-speech gestures and text-driven daily actions) and the complex optimization of mixed conditions with varying granularities (e. g., text and audio).
1 code implementation • CVPR 2025 • Kaiyue Sun, Kaiyi Huang, Xian Liu, Yue Wu, Zihan Xu, Zhenguo Li, Xihui Liu
In this work, we conduct the first systematic study on compositional text-to-video generation.
no code implementations • CVPR 2024 • Yanyu Li, Xian Liu, Anil Kag, Ju Hu, Yerlan Idelbayev, Dhritiman Sagar, Yanzhi Wang, Sergey Tulyakov, Jian Ren
Our findings reveal that, instead of replacing the CLIP text encoder used in Stable Diffusion with other large language models, we can enhance it through our proposed fine-tuning approach, TextCraftor, leading to substantial improvements in quantitative benchmarks and human assessments.
no code implementations • 26 Mar 2024 • Sherwin Bahmani, Xian Liu, Wang Yifan, Ivan Skorokhodov, Victor Rong, Ziwei Liu, Xihui Liu, Jeong Joon Park, Sergey Tulyakov, Gordon Wetzstein, Andrea Tagliasacchi, David B. Lindell
We learn local deformations that conform to the global trajectory using supervision from a text-to-video model.
1 code implementation • 11 Mar 2024 • Xuan Ju, Xian Liu, Xintao Wang, Yuxuan Bian, Ying Shan, Qiang Xu
Image inpainting, the process of restoring corrupted images, has seen significant advancements with the advent of diffusion models (DMs).
no code implementations • 11 Jan 2024 • Yifan Gong, Zheng Zhan, Qing Jin, Yanyu Li, Yerlan Idelbayev, Xian Liu, Andrey Zharkov, Kfir Aberman, Sergey Tulyakov, Yanzhi Wang, Jian Ren
One highly promising direction for enabling flexible real-time on-device image editing is utilizing data distillation by leveraging large-scale text-to-image diffusion models to generate paired datasets used for training generative adversarial networks (GANs).
1 code implementation • CVPR 2024 • Xian Liu, Xiaohang Zhan, Jiaxiang Tang, Ying Shan, Gang Zeng, Dahua Lin, Xihui Liu, Ziwei Liu
In this paper, we propose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-grained geometry and realistic appearance.
no code implementations • 12 Oct 2023 • Xian Liu, Jian Ren, Aliaksandr Siarohin, Ivan Skorokhodov, Yanyu Li, Dahua Lin, Xihui Liu, Ziwei Liu, Sergey Tulyakov
Our model enforces the joint learning of image appearance, spatial relationship, and geometry in a unified network, where each branch in the model complements to each other with both structural awareness and textural richness.
1 code implementation • ICCV 2023 • Rui Qian, Shuangrui Ding, Xian Liu, Dahua Lin
In the second stage, for each semantics, we randomly sample slots from the corresponding Gaussian distribution and perform masked feature aggregation within the semantic area to exploit temporal correspondence patterns for instance identification.
no code implementations • 19 Jul 2023 • Lingting Zhu, Zeyue Xue, Zhenchao Jin, Xian Liu, Jingzhen He, Ziwei Liu, Lequan Yu
This paradigm extends the 2D image diffusion model to a volumetric version with a slightly increasing number of parameters and computation, offering a principled solution for generic cross-modality 3D medical image synthesis.
1 code implementation • CVPR 2023 • Zhengming Yu, Wei Cheng, Xian Liu, Wayne Wu, Kwan-Yee Lin
Recent works propose to graft a deformation network into the NeRF to further model the dynamics of the human neural field for animating vivid human motions.
1 code implementation • CVPR 2023 • Lingting Zhu, Xian Liu, Xuanyu Liu, Rui Qian, Ziwei Liu, Lequan Yu
In this work, we propose a novel diffusion-based framework, named Diffusion Co-Speech Gesture (DiffGesture), to effectively capture the cross-modal audio-to-gesture associations and preserve temporal coherence for high-fidelity audio-driven co-speech gesture generation.
no code implementations • 17 Feb 2023 • Yuzheng Wang, Zuhao Ge, Zhaoyu Chen, Xian Liu, Chuangjia Ma, Yunquan Sun, Lizhe Qi
Data-free knowledge distillation is a challenging model lightweight task for scenarios in which the original dataset is not available.
no code implementations • 5 Dec 2022 • Xian Liu, Qianyi Wu, Hang Zhou, Yuanqi Du, Wayne Wu, Dahua Lin, Ziwei Liu
Our key insight is that the co-speech gestures can be decomposed into common motion patterns and subtle rhythmic dynamics.
1 code implementation • 26 Jul 2022 • Rui Qian, Shuangrui Ding, Xian Liu, Dahua Lin
In this paper, we propose a novel learning scheme for self-supervised video representation learning.
1 code implementation • 20 Jul 2022 • Qianyi Wu, Xian Liu, Yuedong Chen, Kejie Li, Chuanxia Zheng, Jianfei Cai, Jianmin Zheng
This paper proposes a novel framework, ObjectSDF, to build an object-compositional neural implicit representation with high fidelity in 3D reconstruction and object representation.
1 code implementation • CVPR 2022 • Xian Liu, Qianyi Wu, Hang Zhou, Yinghao Xu, Rui Qian, Xinyi Lin, Xiaowei Zhou, Wayne Wu, Bo Dai, Bolei Zhou
To enhance the quality of synthesized gestures, we develop a contrastive learning strategy based on audio-text alignment for better audio representations.
Ranked #3 on
Gesture Generation
on TED Gesture Dataset
no code implementations • 14 Feb 2022 • Cong Sun, Xian Liu, Bile Peng, Eduard Jorswieck
A two-user downlink network aided by a reconfigurable intelligent surface is considered.
1 code implementation • 13 Feb 2022 • Xian Liu, Rui Qian, Hang Zhou, Di Hu, Weiyao Lin, Ziwei Liu, Bolei Zhou, Xiaowei Zhou
Specifically, we observe that the previous practice of learning only a single audio representation is insufficient due to the additive nature of audio signals.
no code implementations • 19 Jan 2022 • Xian Liu, Yinghao Xu, Qianyi Wu, Hang Zhou, Wayne Wu, Bolei Zhou
Moreover, to enable portrait rendering in one unified neural radiance field, a Torso Deformation module is designed to stabilize the large-scale non-rigid torso motions.
no code implementations • 2 Nov 2021 • Mohammad Arani, Mohsen Momenitabar, Zhila Dehdari Ebrahimi, Xian Liu
In this paper, a blood supply chain network, where the occurrence of disruption might interrupt the flow of Red Blood Cells, is dealt with.
no code implementations • 29 Sep 2021 • Yuanqi Du, Xian Liu, Shengchao Liu, Bolei Zhou
In this work, we develop a simple yet effective method to interpret the latent space of the learned generative models with various molecular properties for more interactive molecule generation and discovery.
1 code implementation • ICCV 2021 • Rui Qian, Yuxi Li, Huabin Liu, John See, Shuangrui Ding, Xian Liu, Dian Li, Weiyao Lin
The crux of self-supervised video representation learning is to build general features from unlabeled videos.
no code implementations • 28 May 2021 • Mohammad Arani, Saeed Abdolmaleki, Maryam Maleki, Mohsen Momenitabar, Xian Liu
This study offers a step-by-step practical procedure from the analysis of the current status of the spare parts inventory system to advanced service-level analysis by virtue of simulation-optimization technique for a real-world case study associated with a seaport.
2 code implementations • ECCV 2020 • Junting Dong, Qing Shuai, Yuanqing Zhang, Xian Liu, Xiaowei Zhou, Hujun Bao
Therefore, we propose to capture human motion by jointly analyzing these Internet videos instead of using single videos separately.
no code implementations • 2 Mar 2020 • Mohammad Arani, Mousaalreza Dastmard, Zhila Dehdari Ebrahimi, Mohsen Momenitabar, Xian Liu
Furthermore, a rational presumption is reflected in the problem statement in which the time and cost of PM are pertinent to the interval between the prior perfect repair and current PM.