Search Results for author: Xiao Dong

Found 27 papers, 7 papers with code

TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba

no code implementations21 Feb 2025 Xiuwei Chen, Sihao Lin, Xiao Dong, Zisheng Chen, Meng Cao, Jianhua Han, Hang Xu, Xiaodan Liang

Nevertheless, training specialized subquadratic architectures from scratch for certain tasks is both resource-intensive and time-consuming.

Image Classification Mamba +4

WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction

no code implementations3 Feb 2025 Zilong Wang, Zhiyang Dou, YuAn Liu, Cheng Lin, Xiao Dong, Yunhui Guo, Chenxu Zhang, Xin Li, Wenping Wang, Xiaohu Guo

In this paper, we present WonderHuman to reconstruct dynamic human avatars from a monocular video for high-fidelity novel view synthesis.

3D Human Reconstruction Novel View Synthesis

ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions

1 code implementation21 Jan 2025 Shiyue Zhang, Zheng Chong, Xi Lu, Wenqing Zhang, Haoxiang Li, Xujie Zhang, Jiehui Huang, Xiao Dong, Xiaodan Liang

Building on the success of diffusion models, significant advancements have been made in multimodal image generation tasks.

Image Generation

CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation

1 code implementation20 Jan 2025 Zheng Chong, Wenqing Zhang, Shiyue Zhang, Jun Zheng, Xiao Dong, Haoxiang Li, Yiling Wu, Dongmei Jiang, Xiaodan Liang

Comprehensive experiments demonstrate that CatV2TON outperforms existing methods in both image and video try-on tasks, offering a versatile and reliable solution for realistic virtual try-ons across diverse scenarios.

Video Generation Virtual Try-on

RMAvatar: Photorealistic Human Avatar Reconstruction from Monocular Video Based on Rectified Mesh-embedded Gaussians

no code implementations13 Jan 2025 Sen Peng, Weixing Xie, Zilong Wang, Xiaohu Guo, Zhonggui Chen, Baorong Yang, Xiao Dong

We introduce RMAvatar, a novel human avatar representation with Gaussian splatting embedded on mesh to learn clothed avatar from a monocular video.

LiTformer: Efficient Modeling and Analysis of High-Speed Link Transmitters Using Non-Autoregressive Transformer

no code implementations18 Nov 2024 Songyu Sun, Xiao Dong, Yanliang Sha, Quan Chen, Cheng Zhuo

High-speed serial links are fundamental to energy-efficient and high-performance computing systems such as artificial intelligence, 5G mobile and automotive, enabling low-latency and high-bandwidth communication.

Decoder

A Survey of Foundation Models for Music Understanding

no code implementations15 Sep 2024 Wenjun Li, Ying Cai, Ziyang Wu, Wenyi Zhang, Yifan Chen, Rundong Qi, Mengqi Dong, Peigen Chen, Xiao Dong, Fenghao Shi, Lei Guo, Junwei Han, Bao Ge, Tianming Liu, Lin Gan, Tuo Zhang

Music is essential in daily life, fulfilling emotional and entertainment needs, and connecting us personally, socially, and culturally.

Survey

CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models

1 code implementation21 Jul 2024 Zheng Chong, Xiao Dong, Haoxiang Li, Shiyue Zhang, Wenqing Zhang, Xujie Zhang, Hanqing Zhao, Xiaodan Liang

Virtual try-on methods based on diffusion models achieve realistic try-on effects but often replicate the backbone network as a ReferenceNet or use additional image encoders to process condition inputs, leading to high training and inference costs.

All Fashion Synthesis +2

OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

1 code implementation10 Jul 2024 Hao Wang, Pengzhen Ren, Zequn Jie, Xiao Dong, Chengjian Feng, Yinlong Qian, Lin Ma, Dongmei Jiang, YaoWei Wang, Xiangyuan Lan, Xiaodan Liang

To address these challenges, we propose a novel unified open-vocabulary detection method called OV-DINO, which is pre-trained on diverse large-scale datasets with language-aware selective fusion in a unified framework.

Ranked #5 on Zero-Shot Object Detection on MSCOCO (AP metric, using extra training data)

Zero-Shot Object Detection

SurgicalGaussian: Deformable 3D Gaussians for High-Fidelity Surgical Scene Reconstruction

1 code implementation6 Jul 2024 Weixing Xie, Junfeng Yao, Xianpeng Cao, Qiqin Lin, Zerui Tang, Xiao Dong, Xiaohu Guo

However, based on implicit representation, NeRFs struggle to capture the intricate details of objects in the scene and cannot achieve real-time rendering.

Dynamic Reconstruction

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

1 code implementation25 Apr 2024 Jiehui Huang, Xiao Dong, Wenhui Song, Zheng Chong, Zhenchao Tang, Jun Zhou, Yuhao Cheng, Long Chen, Hanhui Li, Yiqiang Yan, Shengcai Liao, Xiaodan Liang

ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network optimized through the facial attention localization strategy, aimed at preserving ID consistency in facial regions.

Diversity

DRSM: efficient neural 4d decomposition for dynamic reconstruction in stationary monocular cameras

no code implementations1 Feb 2024 Weixing Xie, Xiao Dong, Yong Yang, Qiqin Lin, Jingze Chen, Junfeng Yao, Xiaohu Guo

With the popularity of monocular videos generated by video sharing and live broadcasting applications, reconstructing and editing dynamic scenes in stationary monocular cameras has become a special but anticipated technology.

Dynamic Reconstruction Neural Rendering

UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning

no code implementations1 Jun 2023 Xiao Dong, Runhui Huang, XiaoYong Wei, Zequn Jie, Jianxing Yu, Jian Yin, Xiaodan Liang

Recent advances in vision-language pre-training have enabled machines to perform better in multimodal object discrimination (e. g., image-text semantic alignment) and image synthesis (e. g., text-to-image generation).

Contrastive Learning Retrieval +1

Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval

no code implementations17 Jun 2022 Xiao Dong, Xunlin Zhan, Yunchao Wei, XiaoYong Wei, YaoWei Wang, Minlong Lu, Xiaochun Cao, Xiaodan Liang

Our goal in this research is to study a more realistic environment in which we can conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.

Retrieval

Worst-Case Dynamic Power Distribution Network Noise Prediction Using Convolutional Neural Network

no code implementations27 Apr 2022 Xiao Dong, Yufei Chen, Xunzhao Yin, Cheng Zhuo

Worst-case dynamic PDN noise analysis is an essential step in PDN sign-off to ensure the performance and reliability of chips.

BIG-bench Machine Learning

elBERto: Self-supervised Commonsense Learning for Question Answering

no code implementations17 Mar 2022 Xunlin Zhan, Yuan Li, Xiao Dong, Xiaodan Liang, Zhiting Hu, Lawrence Carin

Commonsense question answering requires reasoning about everyday situations and causes and effects implicit in context.

Question Answering Representation Learning +1

M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining

no code implementations CVPR 2022 Xiao Dong, Xunlin Zhan, Yangxin Wu, Yunchao Wei, Michael C. Kampffmeyer, XiaoYong Wei, Minlong Lu, YaoWei Wang, Xiaodan Liang

Despite the potential of multi-modal pre-training to learn highly discriminative feature representations from complementary data modalities, current progress is being slowed by the lack of large-scale modality-diverse datasets.

Contrastive Learning

Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining

1 code implementation ICCV 2021 Xunlin Zhan, Yangxin Wu, Xiao Dong, Yunchao Wei, Minlong Lu, Yichi Zhang, Hang Xu, Xiaodan Liang

In this paper, we investigate a more realistic setting that aims to perform weakly-supervised multi-modal instance-level product retrieval among fine-grained product categories.

Retrieval

Pinpointing the Memory Behaviors of DNN Training

no code implementations1 Apr 2021 Jiansong Li, Xiao Dong, Guangli Li, Peng Zhao, Xueying Wang, Xiaobing Chen, Xianzhi Yu, Yongxin Yang, Zihan Jiang, Wei Cao, Lei Liu, Xiaobing Feng

The training of deep neural networks (DNNs) is usually memory-hungry due to the limited device memory capacity of DNN accelerators.

A Unified Joint Maximum Mean Discrepancy for Domain Adaptation

no code implementations25 Jan 2021 Wei Wang, Baopu Li, Shuhui Yang, Jing Sun, Zhengming Ding, Junyang Chen, Xiao Dong, Zhihui Wang, Haojie Li

From the revealed unified JMMD, we illustrate that JMMD degrades the feature-label dependence (discriminability) that benefits to classification, and it is sensitive to the label distribution shift when the label kernel is the weighted class conditional one.

Domain Adaptation

Adaptive Collaborative Similarity Learning for Unsupervised Multi-view Feature Selection

no code implementations25 Apr 2019 Xiao Dong, Lei Zhu, Xuemeng Song, Jingjing Li, Zhiyong Cheng

We propose to dynamically learn the collaborative similarity structure, and further integrate it with the ultimate feature selection into a unified framework.

feature selection

Understanding over-parameterized deep networks by geometrization

no code implementations11 Feb 2019 Xiao Dong, Ling Zhou

This can be regarded as a strong support of our proposal that geometrization is not only the bible for physics, it is also the key idea to understand deep learning systems.

Geometrization of deep networks for the interpretability of deep learning systems

no code implementations6 Jan 2019 Xiao Dong, Ling Zhou

By comparing the geometry of image matching and deep networks, we show that geometrization of deep networks can be used to understand existing deep learning systems and it may also help to solve the interpretability problem of deep learning systems.

Deep Learning

Auto-tuning Neural Network Quantization Framework for Collaborative Inference Between the Cloud and Edge

no code implementations16 Dec 2018 Guangli Li, Lei Liu, Xueying Wang, Xiao Dong, Peng Zhao, Xiaobing Feng

By analyzing the characteristics of layers in DNNs, an auto-tuning neural network quantization framework for collaborative inference is proposed.

Collaborative Inference Quantization

Demystifying AlphaGo Zero as AlphaGo GAN

no code implementations24 Nov 2017 Xiao Dong, Jiasong Wu, Ling Zhou

The astonishing success of AlphaGo Zero\cite{Silver_AlphaGo} invokes a worldwide discussion of the future of our human society with a mixed mood of hope, anxiousness, excitement and fear.

How deep learning works --The geometry of deep learning

no code implementations30 Oct 2017 Xiao Dong, Jiasong Wu, Ling Zhou

Why and how that deep learning works well on different tasks remains a mystery from a theoretical perspective.

Deep Learning Template Matching

Cannot find the paper you are looking for? You can Submit a new open access paper.