no code implementations • 10 Mar 2025 • Yuxuan Zhang, Yirui Yuan, Yiren Song, Haofan Wang, Jiaming Liu
These innovations collectively make our framework highly efficient, flexible, and suitable for a wide range of tasks.
1 code implementation • 10 Nov 2024 • Zhennan Chen, Yajie Li, Haofan Wang, Zhibo Chen, Zhengkai Jiang, Jun Li, Qian Wang, Jian Yang, Ying Tai
Regional prompting, or compositional generation, which enables fine-grained spatial control, has gained increasing attention for its practicality in real-world applications.
1 code implementation • 4 Nov 2024 • Anthony Chen, Jianjin Xu, Wenzhao Zheng, Gaole Dai, Yida Wang, Renrui Zhang, Haofan Wang, Shanghang Zhang
Diffusion models have demonstrated excellent capabilities in text-to-image generation.
no code implementations • 9 Oct 2024 • Jen-Yuan Huang, Haofan Wang, Qixun Wang, Xu Bai, Hao Ai, Peng Xing, Jen-tse Huang
In this paper, we introduce Instant-reference Image Restoration (InstantIR), a novel diffusion-based BIR method which dynamically adjusts generation condition during inference.
1 code implementation • 7 Oct 2024 • Yepeng Liu, Yiren Song, Hai Ci, Yu Zhang, Haofan Wang, Mike Zheng Shou, Yuheng Bu
This scheme adds varying numbers of noise steps to the latent representation of the watermarked image, followed by a controlled denoising process starting from this noisy latent representation.
no code implementations • 31 Aug 2024 • Shentong Mo, Haofan Wang
Visual sound localization is a typical and challenging problem that predicts the location of objects corresponding to the sound source in a video.
no code implementations • 29 Aug 2024 • Peng Xing, Haofan Wang, Yanpeng Sun, Qixun Wang, Xu Bai, Hao Ai, Renyuan Huang, Zechao Li
Based on this pipeline, we construct a dataset IMAGStyle, the first large-scale style transfer dataset containing 210k image triplets, available for the community to explore and research.
1 code implementation • 30 Jun 2024 • Haofan Wang, Peng Xing, Renyuan Huang, Hao Ai, Qixun Wang, Xu Bai
Style transfer is an inventive process designed to create an image that maintains the essence of the original while embracing the visual style of another.
no code implementations • 12 May 2024 • Shentong Mo, Haofan Wang, Huaxia Li, Xu Tang
Video-language pre-training is a typical and challenging problem that aims at learning visual and textual representations from large-scale data in a self-supervised way.
no code implementations • 5 May 2024 • Zhenyu Lou, Qiongjie Cui, Haofan Wang, Xu Tang, Hong Zhou
Predicting future human pose is a fundamental application for machine intelligence, which drives robots to plan their behavior and paths ahead of time to seamlessly accomplish human-robot collaboration in real-world 3D scenarios.
1 code implementation • 3 Apr 2024 • Haofan Wang, Matteo Spinelli, Qixun Wang, Xu Bai, Zekui Qin, Anthony Chen
Tuning-free diffusion-based models have demonstrated significant potential in the realm of image personalization and customization.
3 code implementations • 15 Jan 2024 • Qixun Wang, Xu Bai, Haofan Wang, Zekui Qin, Anthony Chen, Huaxia Li, Xu Tang, Yao Hu
There has been significant progress in personalized image synthesis with methods such as Textual Inversion, DreamBooth, and LoRA.
Ranked #2 on
Diffusion Personalization Tuning Free
on AgeDB
no code implementations • CVPR 2024 • Zhenyu Lou, Qiongjie Cui, Haofan Wang, Xu Tang, Hong Zhou
To address this limitation this work introduces a novel multi-modal sense-informed motion prediction approach which conditions high-fidelity generation on two modal information: external 3D scene and internal human gaze and is able to recognize their salience for future human activity.
1 code implementation • 19 Dec 2023 • Pengxiang Ding, Qiongjie Cui, Min Zhang, Mengyuan Liu, Haofan Wang, Donglin Wang
Human motion forecasting, with the goal of estimating future human behavior over a period of time, is a fundamental task in many real-world applications.
no code implementations • 14 Dec 2023 • Anthony Chen, Huanrui Yang, Yulu Gan, Denis A Gudovskiy, Zhen Dong, Haofan Wang, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, Shanghang Zhang
In particular, we build a tree-like Split-Ensemble architecture by performing iterative splitting and pruning from a shared backbone model, where each branch serves as a submodel corresponding to a subtask.
1 code implementation • 17 Aug 2023 • Liang Pan, Jingbo Wang, Buzhen Huang, Junyu Zhang, Haofan Wang, Xu Tang, Yangang Wang
We present a physics-based character control framework for synthesizing human-scene interactions.
2 code implementations • 6 Feb 2023 • Qixun Wang, Xiaofeng Guo, Haofan Wang
Panoptic Scene Graph (PSG) generation aims to generate scene graph representations based on panoptic segmentation instead of rigid bounding boxes.
no code implementations • ICCV 2023 • Qiongjie Cui, Huaijiang Sun, Jianfeng Lu, Weiqing Li, Bin Li, Hongwei Yi, Haofan Wang
Current motion forecasting approaches typically train a deep end-to-end model from the source domain data, and then apply it directly to target subjects.
no code implementations • ICCV 2023 • Yangyi Huang, Hongwei Yi, Weiyang Liu, Haofan Wang, Boxi Wu, Wenxiao Wang, Binbin Lin, Debing Zhang, Deng Cai
Most of these methods fail to achieve realistic reconstruction when only a single image is available.
no code implementations • 11 Jul 2022 • Jinbin Bai, Chunhui Liu, Feiyue Ni, Haofan Wang, Mengying Hu, Xiaofeng Guo, Lele Cheng
To overcome the above issue, we present a novel mechanism for learning the translation relationship from a source modality space $\mathcal{S}$ to a target modality space $\mathcal{T}$ without the need for a joint latent space, which bridges the gap between visual and textual domains.
Ranked #13 on
Zero-Shot Video Retrieval
on MSVD
no code implementations • 30 Oct 2021 • Jue Wang, Haofan Wang, Xing Wu, Chaochen Gao, Debing Zhang
In this paper, we present TransAug (Translate as Augmentation), which provide the first exploration of utilizing translated sentence pairs as data augmentation for text, and introduce a two-stage paradigm to advances the state-of-the-art sentence embeddings.
no code implementations • 10 Sep 2021 • Jue Wang, Haofan Wang, Jincan Deng, Weijia Wu, Debing Zhang
Extra rich non-paired single-modal text data is used for boosting the generalization of text branch.
no code implementations • 24 Jun 2021 • Rakshit Naidu, Aman Priyanshu, Aadith Kumar, Sasikanth Kotti, Haofan Wang, FatemehSadat Mireshghallah
Given the increase in the use of personal data for training Deep Neural Networks (DNNs) in tasks such as medical imaging and diagnosis, differentially private training of DNNs is surging in importance and there is a large body of work focusing on providing better privacy-utility trade-off.
1 code implementation • 15 Dec 2020 • Shentong Mo, Haofan Wang, Pinxu Ren, Ta-Chung Chi
Automatic speech verification (ASV) is the technology to determine the identity of a person based on their voice.
2 code implementations • 25 Jun 2020 • Haofan Wang, Rakshit Naidu, Joy Michael, Soumya Snigdha Kundu
Interpretation of the underlying mechanisms of Deep Convolutional Neural Networks has become an important aspect of research in the field of deep learning due to their applications in high-risk environments.
1 code implementation • NeurIPS 2020 • Zifan Wang, Haofan Wang, Shakul Ramkumar, Matt Fredrikson, Piotr Mardziel, Anupam Datta
Feature attributions are a popular tool for explaining the behavior of Deep Neural Networks (DNNs), but have recently been shown to be vulnerable to attacks that produce divergent explanations for nearby inputs.
1 code implementation • 4 Nov 2019 • Fan Yang, Zijian Zhang, Haofan Wang, Yuening Li, Xia Hu
XDeep is an open-source Python package developed to interpret deep models for both practitioners and researchers.
9 code implementations • 3 Oct 2019 • Haofan Wang, Zifan Wang, Mengnan Du, Fan Yang, Zijian Zhang, Sirui Ding, Piotr Mardziel, Xia Hu
Recently, increasing attention has been drawn to the internal mechanisms of convolutional neural networks, and the reason why the network makes specific decisions.
Ranked #2 on
Error Understanding
on CUB-200-2011 (ResNet-101)
no code implementations • 2 Oct 2019 • Zijian Zhang, Fan Yang, Haofan Wang, Xia Hu
We introduce a new model-agnostic explanation technique which explains the prediction of any classifier called CLE.
1 code implementation • 21 Jan 2019 • Haofan Wang, Zhenghua Chen, Yi Zhou
In this paper, to do the estimation without facial landmarks, we combine the coarse and fine regression output together for a deep network.
Ranked #3 on
Head Pose Estimation
on AFLW