1 code implementation • 12 Dec 2024 • Yuanzhi Zhu, Ruiqing Wang, Shilin Lu, Junnan Li, Hanshu Yan, Kai Zhang
Specifically, we force the predictions from our one-step student model for same input to lie on the same sampling ODE trajectory of the teacher model.
no code implementations • 8 Dec 2024 • Yuanzhi Zhu, Hanshu Yan, Huan Yang, Kai Zhang, Junnan Li
Generative models, particularly diffusion models, have made significant success in data synthesis across various modalities, including images, videos, and 3D assets.
1 code implementation • 8 Oct 2024 • Dongxu Li, Yudong Liu, HaoNing Wu, Yue Wang, Zhiqi Shen, Bowen Qu, Xinyao Niu, Fan Zhou, Chengen Huang, Yanpeng Li, Chongyan Zhu, Xiaoyi Ren, Chao Li, Yifan Ye, Peng Liu, Lihuan Zhang, Hanshu Yan, Guoyin Wang, Bei Chen, Junnan Li
Information comes in diverse modalities.
Ranked #5 on
Video Question Answering
on TVBench
1 code implementation • 28 May 2024 • Lianghui Zhu, Zilong Huang, Bencheng Liao, Jun Hao Liew, Hanshu Yan, Jiashi Feng, Xinggang Wang
In this paper, we aim to incorporate the sub-quadratic modeling capability of Gated Linear Attention (GLA) into the 2D diffusion backbone.
1 code implementation • 27 May 2024 • Jiannan Huang, Jun Hao Liew, Hanshu Yan, Yuyang Yin, Yao Zhao, Yunchao Wei
Recent text-to-image customization works have been proven successful in generating images of given concepts by fine-tuning the diffusion models on a few examples.
1 code implementation • 22 May 2024 • Yujun Shi, Jun Hao Liew, Hanshu Yan, Vincent Y. F. Tan, Jiashi Feng
Accuracy and speed are critical in image editing tasks.
1 code implementation • 13 May 2024 • Hanshu Yan, Xingchao Liu, Jiachun Pan, Jun Hao Liew, Qiang Liu, Jiashi Feng
We present Piecewise Rectified Flow (PeRFlow), a flow-based method for accelerating diffusion models.
no code implementations • 9 Jan 2024 • Weimin WANG, Jiawei Liu, Zhijie Lin, Jiangqiao Yan, Shuo Chen, Chetwin Low, Tuyen Hoang, Jie Wu, Jun Hao Liew, Hanshu Yan, Daquan Zhou, Jiashi Feng
The growing demand for high-fidelity video generation from textual descriptions has catalyzed significant research in this field.
1 code implementation • 19 Dec 2023 • Jiachun Pan, Hanshu Yan, Jun Hao Liew, Jiashi Feng, Vincent Y. F. Tan
However, since the off-the-shelf pre-trained networks are trained on clean images, the one-step estimation procedure of the clean image may be inaccurate, especially in the early stages of the generation process in diffusion models.
3 code implementations • CVPR 2024 • Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Hanshu Yan, Jia-Wei Liu, Chenxu Zhang, Jiashi Feng, Mike Zheng Shou
Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion.
no code implementations • 2 Sep 2023 • Hanshu Yan, Jun Hao Liew, Long Mai, Shanchuan Lin, Jiashi Feng
The flexibility of these techniques enables the editing of arbitrary regions within the frame.
no code implementations • 28 Aug 2023 • Jun Hao Liew, Hanshu Yan, Jianfeng Zhang, Zhongcong Xu, Jiashi Feng
In this report, we present MagicEdit, a surprisingly simple yet effective solution to the text-guided video editing task.
no code implementations • 28 Aug 2023 • Jianfeng Zhang, Hanshu Yan, Zhongcong Xu, Jiashi Feng, Jun Hao Liew
This report presents MagicAvatar, a framework for multimodal video generation and animation of human avatars.
1 code implementation • 20 Jul 2023 • Jiachun Pan, Jun Hao Liew, Vincent Y. F. Tan, Jiashi Feng, Hanshu Yan
Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models (DPMs) with user-provided concepts.
4 code implementations • CVPR 2024 • Yujun Shi, Chuhui Xue, Jun Hao Liew, Jiachun Pan, Hanshu Yan, Wenqing Zhang, Vincent Y. F. Tan, Song Bai
In this work, we extend this editing framework to diffusion models and propose a novel approach DragDiffusion.
1 code implementation • 7 Feb 2023 • Xiang Lan, Hanshu Yan, Shenda Hong, Mengling Feng
In this paper, we study two types of bad positive pairs that can impair the quality of time series representation learned through contrastive learning: the noisy positive pair and the faulty positive pair.
no code implementations • 20 Nov 2022 • Daquan Zhou, Weimin WANG, Hanshu Yan, Weiwei Lv, Yizhe Zhu, Jiashi Feng
In specific, unlike existing works that directly train video models in the RGB space, we use a pre-trained VAE to map video clips into a low-dimensional latent space and learn the distribution of videos' latent codes via a diffusion model.
Ranked #10 on
Text-to-Video Generation
on UCF-101
no code implementations • 19 Nov 2022 • Hanshu Yan
However, deep neural networks have been shown to be vulnerable to adversarial perturbations in input data.
2 code implementations • 28 Oct 2022 • Jun Hao Liew, Hanshu Yan, Daquan Zhou, Jiashi Feng
Unlike style transfer, where an image is stylized according to the reference style without changing the image content, semantic blending mixes two different concepts in a semantic manner to synthesize a novel concept while preserving the spatial layout and geometry.
no code implementations • 12 Jan 2022 • Hanshu Yan, Jingfeng Zhang, Jiashi Feng, Masashi Sugiyama, Vincent Y. F. Tan
Secondly, to robustify DIDs, we propose an adversarial training strategy, hybrid adversarial training ({\sc HAT}), that jointly trains DIDs with adversarial and non-adversarial noisy data to ensure that the reconstruction quality is high and the denoisers around non-adversarial data are locally smooth.
1 code implementation • NeurIPS 2021 • Pan Zhou, Hanshu Yan, Xiaotong Yuan, Jiashi Feng, Shuicheng Yan
Specifically, we prove that lookahead using SGD as its inner-loop optimizer can better balance the optimization error and generalization error to achieve smaller excess risk error than vanilla SGD on (strongly) convex problems and nonconvex problems with Polyak-{\L}ojasiewicz condition which has been observed/proved in neural networks.
1 code implementation • ICLR 2022 • Jiawei Du, Hanshu Yan, Jiashi Feng, Joey Tianyi Zhou, Liangli Zhen, Rick Siow Mong Goh, Vincent Y. F. Tan
Recently, the relation between the sharpness of the loss landscape and the generalization error has been established by Foret et al. (2020), in which the Sharpness Aware Minimizer (SAM) was proposed to mitigate the degradation of the generalization.
1 code implementation • 3 Oct 2021 • Haiyun He, Hanshu Yan, Vincent Y. F. Tan
Using information-theoretic principles, we consider the generalization error (gen-error) of iterative semi-supervised learning (SSL) algorithms that iteratively generate pseudo-labels for a large amount of unlabelled data to progressively refine the model parameters.
no code implementations • 29 Sep 2021 • Haiyun He, Hanshu Yan, Vincent Tan
We consider iterative semi-supervised learning (SSL) algorithms that iteratively generate pseudo-labels for a large amount unlabelled data to progressively refine the model parameters.
1 code implementation • 5 Jul 2021 • Meng-Jiun Chiou, Henghui Ding, Hanshu Yan, Changhu Wang, Roger Zimmermann, Jiashi Feng
Given input images, scene graph generation (SGG) aims to produce comprehensive, graphical representations describing visual relationships among salient objects.
Ranked #3 on
Unbiased Scene Graph Generation
on Visual Genome
2 code implementations • 10 Feb 2021 • Hanshu Yan, Jingfeng Zhang, Gang Niu, Jiashi Feng, Vincent Y. F. Tan, Masashi Sugiyama
By comparing \textit{non-robust} (normally trained) and \textit{robustified} (adversarially trained) models, we observe that adversarial training (AT) robustifies CNNs by aligning the channel-wise activations of adversarial data with those of their natural counterparts.
1 code implementation • 24 Apr 2020 • Jiawei Du, Hanshu Yan, Vincent Y. F. Tan, Joey Tianyi Zhou, Rick Siow Mong Goh, Jiashi Feng
However, similar to existing preprocessing-based methods, the randomized process will degrade the prediction accuracy.
no code implementations • 30 Mar 2020 • Dapeng Hu, Jian Liang, Qibin Hou, Hanshu Yan, Yunpeng Chen, Shuicheng Yan, Jiashi Feng
To successfully align the multi-modal data structures across domains, the following works exploit discriminative information in the adversarial training process, e. g., using multiple class-wise discriminators and introducing conditional information in input or output of the domain discriminator.
2 code implementations • ICLR 2020 • Hanshu Yan, Jiawei Du, Vincent Y. F. Tan, Jiashi Feng
We then provide an insightful understanding of this phenomenon by exploiting a certain desirable property of the flow of a continuous-time ODE, namely that integral curves are non-intersecting.
no code implementations • 25 Sep 2019 • Dapeng Hu, Jian Liang*, Qibin Hou, Hanshu Yan, Jiashi Feng
Previous adversarial learning methods condition domain alignment only on pseudo labels, but noisy and inaccurate pseudo labels may perturb the multi-class distribution embedded in probabilistic predictions, hence bringing insufficient alleviation to the latent mismatch problem.
no code implementations • 13 Jun 2019 • Hanshu Yan, Xuan Chen, Vincent Y. F. Tan, Wenhan Yang, Joe Wu, Jiashi Feng
They jointly facilitate unsupervised learning of a noise model for various noise types.
no code implementations • ICLR 2018 • Hanshu Yan, Jiashi Feng
The results demonstrate that the proposed interpreter successfully finds the core hidden units most responsible for prediction making.