1 code implementation • 6 Dec 2022 • Wenbo Li, Xin Yu, Kun Zhou, Yibing Song, Zhe Lin, Jiaya Jia
Generative adversarial networks (GANs) have made great success in image inpainting yet still have difficulties tackling large missing regions.
1 code implementation • 21 Nov 2022 • Hongyu Liu, Yibing Song, Qifeng Chen
In this work, we propose to first obtain the precise latent code in foundation latent space $\mathcal{W}$.
3 code implementations • 17 Nov 2022 • Shoufa Chen, Peize Sun, Yibing Song, Ping Luo
In inference, the model refines a set of randomly generated boxes to the output results in a progressive way.
1 code implementation • 14 Oct 2022 • Yiming Zhu, Hongyu Liu, Yibing Song, Ziyang Yuan, Xintong Han, Chun Yuan, Qifeng Chen, Jue Wang
Based on the visual latent space of StyleGAN[21] and text embedding space of CLIP[34], studies focus on how to map these two latent spaces for text-driven attribute manipulations.
1 code implementation • 26 May 2022 • Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, Ping Luo
To address this challenge, we propose an effective adaptation approach for Transformer, namely AdaptFormer, which can adapt the pre-trained ViTs into many different image and video tasks efficiently.
2 code implementations • 23 Mar 2022 • Zhan Tong, Yibing Song, Jue Wang, LiMin Wang
Pre-training video transformers on extra large-scale datasets is generally required to achieve premier performance on relatively small datasets.
Ranked #3 on
Action Recognition
on AVA v2.2
(using extra training data)
1 code implementation • CVPR 2022 • Liang Chen, Yong Zhang, Yibing Song, Lingqiao Liu, Jue Wang
Following this principle, we propose to enrich the "diversity" of forgeries by synthesizing augmented forgeries with a pool of forgery configurations and strengthen the "sensitivity" to the forgeries by enforcing the model to predict the forgery configurations.
1 code implementation • 16 Feb 2022 • Youwei Liang, Chongjian Ge, Zhan Tong, Yibing Song, Jue Wang, Pengtao Xie
Second, by maintaining the same computational cost, our method empowers ViTs to take more image tokens as input for recognition accuracy improvement, where the image tokens are from higher resolution images.
2 code implementations • 28 Jan 2022 • Ziyu Wang, Wenhao Jiang, Yiming Zhu, Li Yuan, Yibing Song, Wei Liu
In contrast with vision transformers and CNNs, the success of MLP-like models shows that simple information fusion operations among tokens and channels can yield a good representation power for deep recognition models.
no code implementations • 13 Jan 2022 • Yuying Ge, Yibing Song, Ruimao Zhang, Ping Luo
Dancing video retargeting aims to synthesize a video that transfers the dance movements from a source video to a target person.
1 code implementation • 16 Dec 2021 • Shiming Chen, Ziming Hong, Wenjin Hou, Guo-Sen Xie, Yibing Song, Jian Zhao, Xinge You, Shuicheng Yan, Ling Shao
Analogously, VAT uses the similar feature augmentation encoder to refine the visual features, which are further applied in visual$\rightarrow$attribute decoder to learn visual-based attribute features.
1 code implementation • NeurIPS 2021 • Chongjian Ge, Youwei Liang, Yibing Song, Jianbo Jiao, Jue Wang, Ping Luo
Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL.
1 code implementation • 11 Oct 2021 • Chongjian Ge, Youwei Liang, Yibing Song, Jianbo Jiao, Jue Wang, Ping Luo
Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL.
1 code implementation • ICLR 2022 • Youwei Liang, Chongjian Ge, Zhan Tong, Yibing Song, Jue Wang, Pengtao Xie
Second, by maintaining the same computational cost, our method empowers ViTs to take more image tokens as input for recognition accuracy improvement, where the image tokens are from higher resolution images.
1 code implementation • CVPR 2021 • Hongyu Liu, Ziyu Wan, Wei Huang, Yibing Song, Xintong Han, Jing Liao
To this end, we propose spatially probabilistic diversity normalization (SPDNorm) inside the modulation to model the probability of generating a pixel conditioned on the context information.
1 code implementation • CVPR 2021 • Jie An, Siyu Huang, Yibing Song, Dejing Dou, Wei Liu, Jiebo Luo
The forward inference projects input images into deep features, while the backward inference remaps deep features back to input images in a lossless and unbiased way.
1 code implementation • CVPR 2021 • Shuai Jia, Yibing Song, Chao Ma, Xiaokang Yang
Recently, adversarial attack has been applied to visual object tracking to evaluate the robustness of deep trackers.
1 code implementation • CVPR 2021 • Hongyu Liu, Ziyu Wan, Wei Huang, Yibing Song, Xintong Han, Jing Liao, Bing Jiang, Wei Liu
While existing methods combine an input image and these low-level controls for CNN inputs, the corresponding feature representations are not sufficient to convey user intentions, leading to unfaithfully generated content.
1 code implementation • CVPR 2021 • Chongjian Ge, Yibing Song, Yuying Ge, Han Yang, Wei Liu, Ping Luo
To this end, DCTON can be naturally trained in a self-supervised manner following cycle consistency learning.
1 code implementation • CVPR 2021 • Tian Pan, Yibing Song, Tianyu Yang, Wenhao Jiang, Wei Liu
By empowering the temporal robustness of the encoder and modeling the temporal decay of the keys, our VideoMoCo improves MoCo temporally based on contrastive learning.
Ranked #72 on
Action Recognition
on HMDB-51
1 code implementation • 9 Mar 2021 • Gege Qi, Lijun Gong, Yibing Song, Kai Ma, Yefeng Zheng
However, a threat to these systems arises that adversarial attacks make CNNs vulnerable.
1 code implementation • CVPR 2021 • Yuying Ge, Yibing Song, Ruimao Zhang, Chongjian Ge, Wei Liu, Ping Luo
A recent pioneering work employed knowledge distillation to reduce the dependency of human parsing, where the try-on images produced by a parser-based method are used as supervisions to train a "student" network without relying on segmentation, making the student mimic the try-on ability of the parser-based model.
no code implementations • ICLR 2021 • Gege Qi, Lijun Gong, Yibing Song, Kai Ma, Yefeng Zheng
We further analyze the KL-divergence of the proposed loss function and find that the loss stabilization term makes the perturbations updated towards a fixed objective spot while deviating from the ground truth.
1 code implementation • ECCV 2020 • Yinglong Wang, Yibing Song, Chao Ma, Bing Zeng
Single image deraining regards an input image as a fusion of a background image, a transmission map, rain streaks, and atmosphere light.
1 code implementation • 22 Jul 2020 • Ning Wang, Wengang Zhou, Yibing Song, Chao Ma, Wei Liu, Houqiang Li
The advancement of visual tracking has continuously been brought by deep learning models.
1 code implementation • ECCV 2020 • Shuai Jia, Chao Ma, Yibing Song, Xiaokang Yang
On one hand, we add the temporal perturbations into the original video sequences as adversarial examples to greatly degrade the tracking performance.
1 code implementation • ECCV 2020 • Hongyu Liu, Bin Jiang, Yibing Song, Wei Huang, Chao Yang
We use CNN features from the deep and shallow layers of the encoder to represent structures and textures of an input image, respectively.
1 code implementation • 25 Oct 2019 • Yajing Chen, Fanzi Wu, Zeyu Wang, Yibing Song, Yonggen Ling, Linchao Bao
The displacement map and the coarse model are used to render a final detailed face, which again can be compared with the original input image to serve as a photometric loss for the second stage.
1 code implementation • 23 Jul 2019 • Ning Wang, Wengang Zhou, Yibing Song, Chao Ma, Houqiang Li
In the distillation process, we propose a fidelity loss to enable the student network to maintain the representation capability of the teacher network.
1 code implementation • CVPR 2019 • Fanzi Wu, Linchao Bao, Yajing Chen, Yonggen Ling, Yibing Song, Songnan Li, King Ngi Ngan, Wei Liu
The main ingredient of the view alignment loss is a differentiable dense optical flow estimator that can backpropagate the alignment errors between an input view and a synthetic rendering from another input view, which is projected to the target view through the 3D shape to be inferred.
1 code implementation • CVPR 2019 • Ning Wang, Yibing Song, Chao Ma, Wengang Zhou, Wei Liu, Houqiang Li
We propose an unsupervised visual tracking method in this paper.
no code implementations • 22 Nov 2018 • Yibing Song, Jiawei Zhang, Lijun Gong, Shengfeng He, Linchao Bao, Jinshan Pan, Qingxiong Yang, Ming-Hsuan Yang
We first propose a facial component guided deep Convolutional Neural Network (CNN) to restore a coarse face image, which is denoted as the base image where the facial component is automatically generated from the input face image.
no code implementations • NeurIPS 2018 • Shi Pu, Yibing Song, Chao Ma, Honggang Zhang, Ming-Hsuan Yang
Visual attention, derived from cognitive neuroscience, facilitates human perception on the most pertinent subset of the sensory data.
no code implementations • 27 Sep 2018 • Wenxi Liu, Yibing Song, Dengsheng Chen, Shengfeng He, Yuanlong Yu, Tao Yan, Gerhard P. Hancke, Rynson W. H. Lau
In addition, we also propose a gated fusion scheme to control how the variations captured by the deformable convolution affect the original appearance.
no code implementations • ECCV 2018 • Jianbo Jiao, Ying Cao, Yibing Song, Rynson Lau
Monocular depth estimation benefits greatly from learning based techniques.
1 code implementation • CVPR 2018 • Jiawei Zhang, Jinshan Pan, Jimmy Ren, Yibing Song, Linchao Bao, Rynson W. H. Lau, Ming-Hsuan Yang
The proposed network is composed of three deep convolutional neural networks (CNNs) and a recurrent neural network (RNN).
Ranked #7 on
Deblurring
on RealBlur-R (trained on GoPro)
(SSIM (sRGB) metric)
no code implementations • CVPR 2018 • Yibing Song, Chao Ma, Xiaohe Wu, Lijun Gong, Linchao Bao, WangMeng Zuo, Chunhua Shen, Rynson Lau, Ming-Hsuan Yang
To augment positive samples, we use a generative network to randomly generate masks, which are applied to adaptively dropout input features to capture a variety of appearance changes.
no code implementations • CVPR 2018 • Xin Yang, Ke Xu, Yibing Song, Qiang Zhang, Xiaopeng Wei, Rynson Lau
Given an input LDR image, we first reconstruct the missing details in the HDR domain.
no code implementations • 28 Aug 2017 • Yibing Song, Linchao Bao, Shengfeng He, Qingxiong Yang, Ming-Hsuan Yang
We address the problem of transferring the style of a headshot photo to face images.
no code implementations • ICCV 2017 • Yibing Song, Chao Ma, Lijun Gong, Jiawei Zhang, Rynson Lau, Ming-Hsuan Yang
Our method integrates feature extraction, response map generation as well as model update into the neural networks for an end-to-end training.
no code implementations • 1 Aug 2017 • Yibing Song, Jiawei Zhang, Shengfeng He, Linchao Bao, Qingxiong Yang
We propose a two-stage method for face hallucination.
no code implementations • 1 Aug 2017 • Yibing Song, Jiawei Zhang, Linchao Bao, Qingxiong Yang
Exemplar-based face sketch synthesis methods usually meet the challenging problem that input photos are captured in different lighting conditions from training photos.