Search Results for author: Yibing Song

Found 42 papers, 30 papers with code

SDM: Spatial Diffusion Model for Large Hole Image Inpainting

1 code implementation6 Dec 2022 Wenbo Li, Xin Yu, Kun Zhou, Yibing Song, Zhe Lin, Jiaya Jia

Generative adversarial networks (GANs) have made great success in image inpainting yet still have difficulties tackling large missing regions.

Denoising Image Inpainting

Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint

1 code implementation21 Nov 2022 Hongyu Liu, Yibing Song, Qifeng Chen

In this work, we propose to first obtain the precise latent code in foundation latent space $\mathcal{W}$.

Contrastive Learning

DiffusionDet: Diffusion Model for Object Detection

3 code implementations17 Nov 2022 Shoufa Chen, Peize Sun, Yibing Song, Ping Luo

In inference, the model refines a set of randomly generated boxes to the output results in a progressive way.

Denoising object-detection +1

One Model to Edit Them All: Free-Form Text-Driven Image Manipulation with Semantic Modulations

1 code implementation14 Oct 2022 Yiming Zhu, Hongyu Liu, Yibing Song, Ziyang Yuan, Xintong Han, Chun Yuan, Qifeng Chen, Jue Wang

Based on the visual latent space of StyleGAN[21] and text embedding space of CLIP[34], studies focus on how to map these two latent spaces for text-driven attribute manipulations.

Image Manipulation

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

1 code implementation26 May 2022 Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, Ping Luo

To address this challenge, we propose an effective adaptation approach for Transformer, namely AdaptFormer, which can adapt the pre-trained ViTs into many different image and video tasks efficiently.

Action Recognition Video Recognition

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

2 code implementations23 Mar 2022 Zhan Tong, Yibing Song, Jue Wang, LiMin Wang

Pre-training video transformers on extra large-scale datasets is generally required to achieve premier performance on relatively small datasets.

Ranked #3 on Action Recognition on AVA v2.2 (using extra training data)

Action Classification Self-Supervised Action Recognition +2

Self-supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection

1 code implementation CVPR 2022 Liang Chen, Yong Zhang, Yibing Song, Lingqiao Liu, Jue Wang

Following this principle, we propose to enrich the "diversity" of forgeries by synthesizing augmented forgeries with a pool of forgery configurations and strengthen the "sensitivity" to the forgeries by enforcing the model to predict the forgery configurations.

DeepFake Detection Face Swapping +1

Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations

1 code implementation16 Feb 2022 Youwei Liang, Chongjian Ge, Zhan Tong, Yibing Song, Jue Wang, Pengtao Xie

Second, by maintaining the same computational cost, our method empowers ViTs to take more image tokens as input for recognition accuracy improvement, where the image tokens are from higher resolution images.

DynaMixer: A Vision MLP Architecture with Dynamic Mixing

2 code implementations28 Jan 2022 Ziyu Wang, Wenhao Jiang, Yiming Zhu, Li Yuan, Yibing Song, Wei Liu

In contrast with vision transformers and CNNs, the success of MLP-like models shows that simple information fusion operations among tokens and channels can yield a good representation power for deep recognition models.

Image Classification

MetaDance: Few-shot Dancing Video Retargeting via Temporal-aware Meta-learning

no code implementations13 Jan 2022 Yuying Ge, Yibing Song, Ruimao Zhang, Ping Luo

Dancing video retargeting aims to synthesize a video that transfers the dance movements from a source video to a target person.

Meta-Learning

TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning

1 code implementation16 Dec 2021 Shiming Chen, Ziming Hong, Wenjin Hou, Guo-Sen Xie, Yibing Song, Jian Zhao, Xinge You, Shuicheng Yan, Ling Shao

Analogously, VAT uses the similar feature augmentation encoder to refine the visual features, which are further applied in visual$\rightarrow$attribute decoder to learn visual-based attribute features.

Zero-Shot Learning

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

1 code implementation NeurIPS 2021 Chongjian Ge, Youwei Liang, Yibing Song, Jianbo Jiao, Jue Wang, Ping Luo

Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL.

Image Classification object-detection +3

Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning

1 code implementation11 Oct 2021 Chongjian Ge, Youwei Liang, Yibing Song, Jianbo Jiao, Jue Wang, Ping Luo

Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL.

Image Classification object-detection +3

EViT: Expediting Vision Transformers via Token Reorganizations

1 code implementation ICLR 2022 Youwei Liang, Chongjian Ge, Zhan Tong, Yibing Song, Jue Wang, Pengtao Xie

Second, by maintaining the same computational cost, our method empowers ViTs to take more image tokens as input for recognition accuracy improvement, where the image tokens are from higher resolution images.

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

1 code implementation CVPR 2021 Hongyu Liu, Ziyu Wan, Wei Huang, Yibing Song, Xintong Han, Jing Liao

To this end, we propose spatially probabilistic diversity normalization (SPDNorm) inside the modulation to model the probability of generating a pixel conditioned on the context information.

Image Inpainting Image Restoration

ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows

1 code implementation CVPR 2021 Jie An, Siyu Huang, Yibing Song, Dejing Dou, Wei Liu, Jiebo Luo

The forward inference projects input images into deep features, while the backward inference remaps deep features back to input images in a lossless and unbiased way.

Style Transfer

DeFLOCNet: Deep Image Editing via Flexible Low-level Controls

1 code implementation CVPR 2021 Hongyu Liu, Ziyu Wan, Wei Huang, Yibing Song, Xintong Han, Jing Liao, Bing Jiang, Wei Liu

While existing methods combine an input image and these low-level controls for CNN inputs, the corresponding feature representations are not sufficient to convey user intentions, leading to unfaithfully generated content.

Texture Synthesis

Disentangled Cycle Consistency for Highly-realistic Virtual Try-On

1 code implementation CVPR 2021 Chongjian Ge, Yibing Song, Yuying Ge, Han Yang, Wei Liu, Ping Luo

To this end, DCTON can be naturally trained in a self-supervised manner following cycle consistency learning.

Virtual Try-on

VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples

1 code implementation CVPR 2021 Tian Pan, Yibing Song, Tianyu Yang, Wenhao Jiang, Wei Liu

By empowering the temporal robustness of the encoder and modeling the temporal decay of the keys, our VideoMoCo improves MoCo temporally based on contrastive learning.

Action Recognition Contrastive Learning +1

Stabilized Medical Image Attacks

1 code implementation9 Mar 2021 Gege Qi, Lijun Gong, Yibing Song, Kai Ma, Yefeng Zheng

However, a threat to these systems arises that adversarial attacks make CNNs vulnerable.

Adversarial Attack Medical Diagnosis

Parser-Free Virtual Try-on via Distilling Appearance Flows

1 code implementation CVPR 2021 Yuying Ge, Yibing Song, Ruimao Zhang, Chongjian Ge, Wei Liu, Ping Luo

A recent pioneering work employed knowledge distillation to reduce the dependency of human parsing, where the try-on images produced by a parser-based method are used as supervisions to train a "student" network without relying on segmentation, making the student mimic the try-on ability of the parser-based model.

Human Parsing Knowledge Distillation +1

Stabilized Medical Attacks

no code implementations ICLR 2021 Gege Qi, Lijun Gong, Yibing Song, Kai Ma, Yefeng Zheng

We further analyze the KL-divergence of the proposed loss function and find that the loss stabilization term makes the perturbations updated towards a fixed objective spot while deviating from the ground truth.

Adversarial Attack Medical Diagnosis

Rethinking Image Deraining via Rain Streaks and Vapors

1 code implementation ECCV 2020 Yinglong Wang, Yibing Song, Chao Ma, Bing Zeng

Single image deraining regards an input image as a fusion of a background image, a transmission map, rain streaks, and atmosphere light.

Image Generation Image Restoration +1

Robust Tracking against Adversarial Attacks

1 code implementation ECCV 2020 Shuai Jia, Chao Ma, Yibing Song, Xiaokang Yang

On one hand, we add the temporal perturbations into the original video sequences as adversarial examples to greatly degrade the tracking performance.

Adversarial Attack

Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations

1 code implementation ECCV 2020 Hongyu Liu, Bin Jiang, Yibing Song, Wei Huang, Chao Yang

We use CNN features from the deep and shallow layers of the encoder to represent structures and textures of an input image, respectively.

Image Inpainting

Self-supervised Learning of Detailed 3D Face Reconstruction

1 code implementation25 Oct 2019 Yajing Chen, Fanzi Wu, Zeyu Wang, Yibing Song, Yonggen Ling, Linchao Bao

The displacement map and the coarse model are used to render a final detailed face, which again can be compared with the original input image to serve as a photometric loss for the second stage.

3D Face Reconstruction Face Alignment +1

Real-Time Correlation Tracking via Joint Model Compression and Transfer

1 code implementation23 Jul 2019 Ning Wang, Wengang Zhou, Yibing Song, Chao Ma, Houqiang Li

In the distillation process, we propose a fidelity loss to enable the student network to maintain the representation capability of the teacher network.

Knowledge Distillation Model Compression +2

MVF-Net: Multi-View 3D Face Morphable Model Regression

1 code implementation CVPR 2019 Fanzi Wu, Linchao Bao, Yajing Chen, Yonggen Ling, Yibing Song, Songnan Li, King Ngi Ngan, Wei Liu

The main ingredient of the view alignment loss is a differentiable dense optical flow estimator that can backpropagate the alignment errors between an input view and a synthetic rendering from another input view, which is projected to the target view through the 3D shape to be inferred.

Optical Flow Estimation regression

Joint Face Hallucination and Deblurring via Structure Generation and Detail Enhancement

no code implementations22 Nov 2018 Yibing Song, Jiawei Zhang, Lijun Gong, Shengfeng He, Linchao Bao, Jinshan Pan, Qingxiong Yang, Ming-Hsuan Yang

We first propose a facial component guided deep Convolutional Neural Network (CNN) to restore a coarse face image, which is denoted as the base image where the facial component is automatically generated from the input face image.

Deblurring Face Hallucination +1

Deep Attentive Tracking via Reciprocative Learning

no code implementations NeurIPS 2018 Shi Pu, Yibing Song, Chao Ma, Honggang Zhang, Ming-Hsuan Yang

Visual attention, derived from cognitive neuroscience, facilitates human perception on the most pertinent subset of the sensory data.

Visual Tracking

Deformable Object Tracking with Gated Fusion

no code implementations27 Sep 2018 Wenxi Liu, Yibing Song, Dengsheng Chen, Shengfeng He, Yuanlong Yu, Tao Yan, Gerhard P. Hancke, Rynson W. H. Lau

In addition, we also propose a gated fusion scheme to control how the variations captured by the deformable convolution affect the original appearance.

Object Tracking

Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks

1 code implementation CVPR 2018 Jiawei Zhang, Jinshan Pan, Jimmy Ren, Yibing Song, Linchao Bao, Rynson W. H. Lau, Ming-Hsuan Yang

The proposed network is composed of three deep convolutional neural networks (CNNs) and a recurrent neural network (RNN).

Ranked #7 on Deblurring on RealBlur-R (trained on GoPro) (SSIM (sRGB) metric)

Deblurring

VITAL: VIsual Tracking via Adversarial Learning

no code implementations CVPR 2018 Yibing Song, Chao Ma, Xiaohe Wu, Lijun Gong, Linchao Bao, WangMeng Zuo, Chunhua Shen, Rynson Lau, Ming-Hsuan Yang

To augment positive samples, we use a generative network to randomly generate masks, which are applied to adaptively dropout input features to capture a variety of appearance changes.

General Classification Visual Tracking

Stylizing Face Images via Multiple Exemplars

no code implementations28 Aug 2017 Yibing Song, Linchao Bao, Shengfeng He, Qingxiong Yang, Ming-Hsuan Yang

We address the problem of transferring the style of a headshot photo to face images.

CREST: Convolutional Residual Learning for Visual Tracking

no code implementations ICCV 2017 Yibing Song, Chao Ma, Lijun Gong, Jiawei Zhang, Rynson Lau, Ming-Hsuan Yang

Our method integrates feature extraction, response map generation as well as model update into the neural networks for an end-to-end training.

Visual Tracking

Fast Preprocessing for Robust Face Sketch Synthesis

no code implementations1 Aug 2017 Yibing Song, Jiawei Zhang, Linchao Bao, Qingxiong Yang

Exemplar-based face sketch synthesis methods usually meet the challenging problem that input photos are captured in different lighting conditions from training photos.

Face Sketch Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.