no code implementations • 21 Feb 2025 • Xiuwei Chen, Sihao Lin, Xiao Dong, Zisheng Chen, Meng Cao, Jianhua Han, Hang Xu, Xiaodan Liang
Nevertheless, training specialized subquadratic architectures from scratch for certain tasks is both resource-intensive and time-consuming.
no code implementations • 3 Feb 2025 • Zilong Wang, Zhiyang Dou, YuAn Liu, Cheng Lin, Xiao Dong, Yunhui Guo, Chenxu Zhang, Xin Li, Wenping Wang, Xiaohu Guo
In this paper, we present WonderHuman to reconstruct dynamic human avatars from a monocular video for high-fidelity novel view synthesis.
1 code implementation • 21 Jan 2025 • Shiyue Zhang, Zheng Chong, Xi Lu, Wenqing Zhang, Haoxiang Li, Xujie Zhang, Jiehui Huang, Xiao Dong, Xiaodan Liang
Building on the success of diffusion models, significant advancements have been made in multimodal image generation tasks.
1 code implementation • 20 Jan 2025 • Zheng Chong, Wenqing Zhang, Shiyue Zhang, Jun Zheng, Xiao Dong, Haoxiang Li, Yiling Wu, Dongmei Jiang, Xiaodan Liang
Comprehensive experiments demonstrate that CatV2TON outperforms existing methods in both image and video try-on tasks, offering a versatile and reliable solution for realistic virtual try-ons across diverse scenarios.
no code implementations • 13 Jan 2025 • Sen Peng, Weixing Xie, Zilong Wang, Xiaohu Guo, Zhonggui Chen, Baorong Yang, Xiao Dong
We introduce RMAvatar, a novel human avatar representation with Gaussian splatting embedded on mesh to learn clothed avatar from a monocular video.
no code implementations • 18 Nov 2024 • Songyu Sun, Xiao Dong, Yanliang Sha, Quan Chen, Cheng Zhuo
High-speed serial links are fundamental to energy-efficient and high-performance computing systems such as artificial intelligence, 5G mobile and automotive, enabling low-latency and high-bandwidth communication.
no code implementations • 15 Sep 2024 • Wenjun Li, Ying Cai, Ziyang Wu, Wenyi Zhang, Yifan Chen, Rundong Qi, Mengqi Dong, Peigen Chen, Xiao Dong, Fenghao Shi, Lei Guo, Junwei Han, Bao Ge, Tianming Liu, Lin Gan, Tuo Zhang
Music is essential in daily life, fulfilling emotional and entertainment needs, and connecting us personally, socially, and culturally.
1 code implementation • 21 Jul 2024 • Zheng Chong, Xiao Dong, Haoxiang Li, Shiyue Zhang, Wenqing Zhang, Xujie Zhang, Hanqing Zhao, Xiaodan Liang
Virtual try-on methods based on diffusion models achieve realistic try-on effects but often replicate the backbone network as a ReferenceNet or use additional image encoders to process condition inputs, leading to high training and inference costs.
1 code implementation • 10 Jul 2024 • Hao Wang, Pengzhen Ren, Zequn Jie, Xiao Dong, Chengjian Feng, Yinlong Qian, Lin Ma, Dongmei Jiang, YaoWei Wang, Xiangyuan Lan, Xiaodan Liang
To address these challenges, we propose a novel unified open-vocabulary detection method called OV-DINO, which is pre-trained on diverse large-scale datasets with language-aware selective fusion in a unified framework.
Ranked #5 on
Zero-Shot Object Detection
on MSCOCO
(AP metric, using extra
training data)
1 code implementation • 6 Jul 2024 • Weixing Xie, Junfeng Yao, Xianpeng Cao, Qiqin Lin, Zerui Tang, Xiao Dong, Xiaohu Guo
However, based on implicit representation, NeRFs struggle to capture the intricate details of objects in the scene and cannot achieve real-time rendering.
1 code implementation • 25 Apr 2024 • Jiehui Huang, Xiao Dong, Wenhui Song, Zheng Chong, Zhenchao Tang, Jun Zhou, Yuhao Cheng, Long Chen, Hanhui Li, Yiqiang Yan, Shengcai Liao, Xiaodan Liang
ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network optimized through the facial attention localization strategy, aimed at preserving ID consistency in facial regions.
no code implementations • 1 Feb 2024 • Weixing Xie, Xiao Dong, Yong Yang, Qiqin Lin, Jingze Chen, Junfeng Yao, Xiaohu Guo
With the popularity of monocular videos generated by video sharing and live broadcasting applications, reconstructing and editing dynamic scenes in stationary monocular cameras has become a special but anticipated technology.
no code implementations • 1 Jun 2023 • Xiao Dong, Runhui Huang, XiaoYong Wei, Zequn Jie, Jianxing Yu, Jian Yin, Xiaodan Liang
Recent advances in vision-language pre-training have enabled machines to perform better in multimodal object discrimination (e. g., image-text semantic alignment) and image synthesis (e. g., text-to-image generation).
no code implementations • 17 Jun 2022 • Xiao Dong, Xunlin Zhan, Yunchao Wei, XiaoYong Wei, YaoWei Wang, Minlong Lu, Xiaochun Cao, Xiaodan Liang
Our goal in this research is to study a more realistic environment in which we can conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.
no code implementations • 27 Apr 2022 • Xiao Dong, Yufei Chen, Xunzhao Yin, Cheng Zhuo
Worst-case dynamic PDN noise analysis is an essential step in PDN sign-off to ensure the performance and reliability of chips.
no code implementations • 17 Mar 2022 • Xunlin Zhan, Yuan Li, Xiao Dong, Xiaodan Liang, Zhiting Hu, Lawrence Carin
Commonsense question answering requires reasoning about everyday situations and causes and effects implicit in context.
no code implementations • CVPR 2022 • Xiao Dong, Xunlin Zhan, Yangxin Wu, Yunchao Wei, Michael C. Kampffmeyer, XiaoYong Wei, Minlong Lu, YaoWei Wang, Xiaodan Liang
Despite the potential of multi-modal pre-training to learn highly discriminative feature representations from complementary data modalities, current progress is being slowed by the lack of large-scale modality-diverse datasets.
1 code implementation • ICCV 2021 • Xunlin Zhan, Yangxin Wu, Xiao Dong, Yunchao Wei, Minlong Lu, Yichi Zhang, Hang Xu, Xiaodan Liang
In this paper, we investigate a more realistic setting that aims to perform weakly-supervised multi-modal instance-level product retrieval among fine-grained product categories.
no code implementations • 1 Apr 2021 • Jiansong Li, Xiao Dong, Guangli Li, Peng Zhao, Xueying Wang, Xiaobing Chen, Xianzhi Yu, Yongxin Yang, Zihan Jiang, Wei Cao, Lei Liu, Xiaobing Feng
The training of deep neural networks (DNNs) is usually memory-hungry due to the limited device memory capacity of DNN accelerators.
no code implementations • 25 Jan 2021 • Wei Wang, Baopu Li, Shuhui Yang, Jing Sun, Zhengming Ding, Junyang Chen, Xiao Dong, Zhihui Wang, Haojie Li
From the revealed unified JMMD, we illustrate that JMMD degrades the feature-label dependence (discriminability) that benefits to classification, and it is sensitive to the label distribution shift when the label kernel is the weighted class conditional one.
no code implementations • 25 Apr 2019 • Xiao Dong, Lei Zhu, Xuemeng Song, Jingjing Li, Zhiyong Cheng
We propose to dynamically learn the collaborative similarity structure, and further integrate it with the ultimate feature selection into a unified framework.
no code implementations • 11 Feb 2019 • Xiao Dong, Ling Zhou
This can be regarded as a strong support of our proposal that geometrization is not only the bible for physics, it is also the key idea to understand deep learning systems.
no code implementations • 17 Jan 2019 • Xueying Wang, Lei Liu, Guangli Li, Xiao Dong, Peng Zhao, Xiaobing Feng
Background subtraction is a significant component of computer vision systems.
no code implementations • 6 Jan 2019 • Xiao Dong, Ling Zhou
By comparing the geometry of image matching and deep networks, we show that geometrization of deep networks can be used to understand existing deep learning systems and it may also help to solve the interpretability problem of deep learning systems.
no code implementations • 16 Dec 2018 • Guangli Li, Lei Liu, Xueying Wang, Xiao Dong, Peng Zhao, Xiaobing Feng
By analyzing the characteristics of layers in DNNs, an auto-tuning neural network quantization framework for collaborative inference is proposed.
no code implementations • 24 Nov 2017 • Xiao Dong, Jiasong Wu, Ling Zhou
The astonishing success of AlphaGo Zero\cite{Silver_AlphaGo} invokes a worldwide discussion of the future of our human society with a mixed mood of hope, anxiousness, excitement and fear.
no code implementations • 30 Oct 2017 • Xiao Dong, Jiasong Wu, Ling Zhou
Why and how that deep learning works well on different tasks remains a mystery from a theoretical perspective.