Search Results for author: Yibing Song

Found 58 papers, 40 papers with code

Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation

1 code implementation18 Mar 2024 Wangbo Zhao, Jiasheng Tang, Yizeng Han, Yibing Song, Kai Wang, Gao Huang, Fan Wang, Yang You

Existing parameter-efficient fine-tuning (PEFT) methods have achieved significant success on vision transformers (ViTs) adaptation by improving parameter efficiency.

Semantic Segmentation Video Recognition

A Causal Inspired Early-Branching Structure for Domain Generalization

1 code implementation13 Mar 2024 Liang Chen, Yong Zhang, Yibing Song, Zhen Zhang, Lingqiao Liu

By d-separation, we observe that the causal feature can be further characterized by being independent of the domain conditioned on the object, and we propose the following two strategies as complements for the basic framework.

Domain Generalization

HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation

no code implementations12 Dec 2023 Hongyu Liu, Xuan Wang, Ziyu Wan, Yujun Shen, Yibing Song, Jing Liao, Qifeng Chen

The noisy image, landmarks, and text condition are then fed into the frozen ControlNet twice for noise prediction.

Advancing Vision Transformers with Group-Mix Attention

1 code implementation26 Nov 2023 Chongjian Ge, Xiaohan Ding, Zhan Tong, Li Yuan, Jiangliu Wang, Yibing Song, Ping Luo

The attention map is computed based on the mixtures of tokens and group proxies and used to re-combine the tokens and groups in Value.

Image Classification object-detection +2

InstructDET: Diversifying Referring Object Detection with Generalized Instructions

1 code implementation8 Oct 2023 Ronghao Dang, Jiangyan Feng, Haodong Zhang, Chongjian Ge, Lin Song, Lijun Gong, Chengju Liu, Qijun Chen, Feng Zhu, Rui Zhao, Yibing Song

In order to encompass common detection expressions, we involve emerging vision-language model (VLM) and large language model (LLM) to generate instructions guided by text prompts and object bbxs, as the generalizations of foundation models are effective to produce human-like expressions (e. g., describing object property, category, and relationship).

Language Modelling Large Language Model +4

Domain Generalization via Rationale Invariance

1 code implementation ICCV 2023 Liang Chen, Yong Zhang, Yibing Song, Anton Van Den Hengel, Lingqiao Liu

Specifically, we propose treating the element-wise contributions to the final results as the rationale for making a decision and representing the rationale for each sample as a matrix.

Decision Making Domain Generalization

Advancing Visual Grounding with Scene Knowledge: Benchmark and Method

1 code implementation CVPR 2023 Zhihong Chen, Ruifei Zhang, Yibing Song, Xiang Wan, Guanbin Li

Therefore, in this paper, we propose a novel benchmark of \underline{S}cene \underline{K}nowledge-guided \underline{V}isual \underline{G}rounding (SK-VG), where the image content and referring expressions are not sufficient to ground the target objects, forcing the models to have a reasoning ability on the long-form scene knowledge.

Image-text matching Text Matching +1

Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation

1 code implementation ICCV 2023 Zunnan Xu, Zhihong Chen, Yong Zhang, Yibing Song, Xiang Wan, Guanbin Li

Parameter Efficient Tuning (PET) has gained attention for reducing the number of parameters while maintaining performance and providing better hardware resource savings, but few studies investigate dense prediction tasks and interaction between modalities.

Decoder Image Segmentation +3

Evolving Semantic Prototype Improves Generative Zero-Shot Learning

no code implementations12 Jun 2023 Shiming Chen, Wenjin Hou, Ziming Hong, Xiaohan Ding, Yibing Song, Xinge You, Tongliang Liu, Kun Zhang

After alignment, synthesized sample features from unseen classes are closer to the real sample features and benefit DSP to improve existing generative ZSL methods by 8. 5\%, 8. 0\%, and 9. 7\% on the standard CUB, SUN AWA2 datasets, the significant performance improvement indicates that evolving semantic prototype explores a virgin field in ZSL.

Zero-Shot Learning

Improved Test-Time Adaptation for Domain Generalization

1 code implementation CVPR 2023 Liang Chen, Yong Zhang, Yibing Song, Ying Shan, Lingqiao Liu

Generally, a TTT strategy hinges its performance on two main factors: selecting an appropriate auxiliary TTT task for updating and identifying reliable parameters to update during the test phase.

Image to sketch recognition Single-Source Domain Generalization +1

Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning

no code implementations30 Mar 2023 Chongjian Ge, Jiangliu Wang, Zhan Tong, Shoufa Chen, Yibing Song, Ping Luo

We evaluate our soft neighbor contrastive learning method (SNCLR) on standard visual recognition benchmarks, including image classification, object detection, and instance segmentation.

Contrastive Learning Image Classification +6

Human MotionFormer: Transferring Human Motions with Vision Transformers

1 code implementation22 Feb 2023 Hongyu Liu, Xintong Han, ChengBin Jin, Lihui Qian, Huawei Wei, Zhe Lin, Faqiang Wang, Haoye Dong, Yibing Song, Jia Xu, Qifeng Chen

In this paper, we propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching, respectively.

Decoder Motion Synthesis

Both Diverse and Realism Matter: Physical Attribute and Style Alignment for Rainy Image Generation

no code implementations ICCV 2023 Changfeng Yu, Shiming Chen, Yi Chang, Yibing Song, Luxin Yan

To solve this dilemma, we propose a physical alignment and controllable generation network (PCGNet) for diverse and realistic rain generation.

Attribute Image Generation +1

Image Inpainting via Iteratively Decoupled Probabilistic Modeling

2 code implementations6 Dec 2022 Wenbo Li, Xin Yu, Kun Zhou, Yibing Song, Zhe Lin, Jiaya Jia

To achieve high-quality results with low computational cost, we present a novel pixel spread model (PSM) that iteratively employs decoupled probabilistic modeling, combining the optimization efficiency of GANs with the prediction tractability of probabilistic models.

Denoising Image Inpainting

Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint

1 code implementation CVPR 2023 Hongyu Liu, Yibing Song, Qifeng Chen

In this work, we propose to first obtain the precise latent code in foundation latent space $\mathcal{W}$.

Contrastive Learning

DiffusionDet: Diffusion Model for Object Detection

3 code implementations ICCV 2023 Shoufa Chen, Peize Sun, Yibing Song, Ping Luo

We propose DiffusionDet, a new framework that formulates object detection as a denoising diffusion process from noisy boxes to object boxes.

Denoising Object +2

One Model to Edit Them All: Free-Form Text-Driven Image Manipulation with Semantic Modulations

1 code implementation14 Oct 2022 Yiming Zhu, Hongyu Liu, Yibing Song, Ziyang Yuan, Xintong Han, Chun Yuan, Qifeng Chen, Jue Wang

Based on the visual latent space of StyleGAN[21] and text embedding space of CLIP[34], studies focus on how to map these two latent spaces for text-driven attribute manipulations.

Attribute Image Manipulation

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

2 code implementations26 May 2022 Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, Ping Luo

To address this challenge, we propose an effective adaptation approach for Transformer, namely AdaptFormer, which can adapt the pre-trained ViTs into many different image and video tasks efficiently.

Action Recognition Video Recognition

Self-supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection

1 code implementation CVPR 2022 Liang Chen, Yong Zhang, Yibing Song, Lingqiao Liu, Jue Wang

Following this principle, we propose to enrich the "diversity" of forgeries by synthesizing augmented forgeries with a pool of forgery configurations and strengthen the "sensitivity" to the forgeries by enforcing the model to predict the forgery configurations.

DeepFake Detection Face Swapping +1

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

4 code implementations23 Mar 2022 Zhan Tong, Yibing Song, Jue Wang, LiMin Wang

Pre-training video transformers on extra large-scale datasets is generally required to achieve premier performance on relatively small datasets.

4k Action Classification +3

Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations

1 code implementation16 Feb 2022 Youwei Liang, Chongjian Ge, Zhan Tong, Yibing Song, Jue Wang, Pengtao Xie

Second, by maintaining the same computational cost, our method empowers ViTs to take more image tokens as input for recognition accuracy improvement, where the image tokens are from higher resolution images.

Efficient ViTs

DynaMixer: A Vision MLP Architecture with Dynamic Mixing

2 code implementations28 Jan 2022 Ziyu Wang, Wenhao Jiang, Yiming Zhu, Li Yuan, Yibing Song, Wei Liu

In contrast with vision transformers and CNNs, the success of MLP-like models shows that simple information fusion operations among tokens and channels can yield a good representation power for deep recognition models.

Image Classification

MetaDance: Few-shot Dancing Video Retargeting via Temporal-aware Meta-learning

no code implementations13 Jan 2022 Yuying Ge, Yibing Song, Ruimao Zhang, Ping Luo

Dancing video retargeting aims to synthesize a video that transfers the dance movements from a source video to a target person.


TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning

1 code implementation16 Dec 2021 Shiming Chen, Ziming Hong, Wenjin Hou, Guo-Sen Xie, Yibing Song, Jian Zhao, Xinge You, Shuicheng Yan, Ling Shao

Analogously, VAT uses the similar feature augmentation encoder to refine the visual features, which are further applied in visual$\rightarrow$attribute decoder to learn visual-based attribute features.

Attribute Decoder +1

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

1 code implementation NeurIPS 2021 Chongjian Ge, Youwei Liang, Yibing Song, Jianbo Jiao, Jue Wang, Ping Luo

Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL.

Image Classification object-detection +3

Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning

1 code implementation11 Oct 2021 Chongjian Ge, Youwei Liang, Yibing Song, Jianbo Jiao, Jue Wang, Ping Luo

Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL.

Image Classification object-detection +3

EViT: Expediting Vision Transformers via Token Reorganizations

1 code implementation ICLR 2022 Youwei Liang, Chongjian Ge, Zhan Tong, Yibing Song, Jue Wang, Pengtao Xie

Second, by maintaining the same computational cost, our method empowers ViTs to take more image tokens as input for recognition accuracy improvement, where the image tokens are from higher resolution images.

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

1 code implementation CVPR 2021 Hongyu Liu, Ziyu Wan, Wei Huang, Yibing Song, Xintong Han, Jing Liao

To this end, we propose spatially probabilistic diversity normalization (SPDNorm) inside the modulation to model the probability of generating a pixel conditioned on the context information.

Image Inpainting Image Restoration

ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows

1 code implementation CVPR 2021 Jie An, Siyu Huang, Yibing Song, Dejing Dou, Wei Liu, Jiebo Luo

The forward inference projects input images into deep features, while the backward inference remaps deep features back to input images in a lossless and unbiased way.

Style Transfer

DeFLOCNet: Deep Image Editing via Flexible Low-level Controls

1 code implementation CVPR 2021 Hongyu Liu, Ziyu Wan, Wei Huang, Yibing Song, Xintong Han, Jing Liao, Bing Jiang, Wei Liu

While existing methods combine an input image and these low-level controls for CNN inputs, the corresponding feature representations are not sufficient to convey user intentions, leading to unfaithfully generated content.

Decoder Texture Synthesis

Disentangled Cycle Consistency for Highly-realistic Virtual Try-On

1 code implementation CVPR 2021 Chongjian Ge, Yibing Song, Yuying Ge, Han Yang, Wei Liu, Ping Luo

To this end, DCTON can be naturally trained in a self-supervised manner following cycle consistency learning.

Virtual Try-on

VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples

1 code implementation CVPR 2021 Tian Pan, Yibing Song, Tianyu Yang, Wenhao Jiang, Wei Liu

By empowering the temporal robustness of the encoder and modeling the temporal decay of the keys, our VideoMoCo improves MoCo temporally based on contrastive learning.

Action Recognition Contrastive Learning +1

Stabilized Medical Image Attacks

1 code implementation9 Mar 2021 Gege Qi, Lijun Gong, Yibing Song, Kai Ma, Yefeng Zheng

However, a threat to these systems arises that adversarial attacks make CNNs vulnerable.

Adversarial Attack Medical Diagnosis

Parser-Free Virtual Try-on via Distilling Appearance Flows

2 code implementations CVPR 2021 Yuying Ge, Yibing Song, Ruimao Zhang, Chongjian Ge, Wei Liu, Ping Luo

A recent pioneering work employed knowledge distillation to reduce the dependency of human parsing, where the try-on images produced by a parser-based method are used as supervisions to train a "student" network without relying on segmentation, making the student mimic the try-on ability of the parser-based model.

Human Parsing Knowledge Distillation +1

Stabilized Medical Attacks

no code implementations ICLR 2021 Gege Qi, Lijun Gong, Yibing Song, Kai Ma, Yefeng Zheng

We further analyze the KL-divergence of the proposed loss function and find that the loss stabilization term makes the perturbations updated towards a fixed objective spot while deviating from the ground truth.

Adversarial Attack Medical Diagnosis

Rethinking Image Deraining via Rain Streaks and Vapors

1 code implementation ECCV 2020 Yinglong Wang, Yibing Song, Chao Ma, Bing Zeng

Single image deraining regards an input image as a fusion of a background image, a transmission map, rain streaks, and atmosphere light.

Image Generation Image Restoration +1

Robust Tracking against Adversarial Attacks

2 code implementations ECCV 2020 Shuai Jia, Chao Ma, Yibing Song, Xiaokang Yang

On one hand, we add the temporal perturbations into the original video sequences as adversarial examples to greatly degrade the tracking performance.

Adversarial Attack

Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations

1 code implementation ECCV 2020 Hongyu Liu, Bin Jiang, Yibing Song, Wei Huang, Chao Yang

We use CNN features from the deep and shallow layers of the encoder to represent structures and textures of an input image, respectively.

Decoder Image Inpainting

Self-supervised Learning of Detailed 3D Face Reconstruction

1 code implementation25 Oct 2019 Yajing Chen, Fanzi Wu, Zeyu Wang, Yibing Song, Yonggen Ling, Linchao Bao

The displacement map and the coarse model are used to render a final detailed face, which again can be compared with the original input image to serve as a photometric loss for the second stage.

3D Face Reconstruction Face Alignment +1

Real-Time Correlation Tracking via Joint Model Compression and Transfer

1 code implementation23 Jul 2019 Ning Wang, Wengang Zhou, Yibing Song, Chao Ma, Houqiang Li

In the distillation process, we propose a fidelity loss to enable the student network to maintain the representation capability of the teacher network.

Computational Efficiency Image Classification +4

MVF-Net: Multi-View 3D Face Morphable Model Regression

1 code implementation CVPR 2019 Fanzi Wu, Linchao Bao, Yajing Chen, Yonggen Ling, Yibing Song, Songnan Li, King Ngi Ngan, Wei Liu

The main ingredient of the view alignment loss is a differentiable dense optical flow estimator that can backpropagate the alignment errors between an input view and a synthetic rendering from another input view, which is projected to the target view through the 3D shape to be inferred.

Optical Flow Estimation regression

Joint Face Hallucination and Deblurring via Structure Generation and Detail Enhancement

no code implementations22 Nov 2018 Yibing Song, Jiawei Zhang, Lijun Gong, Shengfeng He, Linchao Bao, Jinshan Pan, Qingxiong Yang, Ming-Hsuan Yang

We first propose a facial component guided deep Convolutional Neural Network (CNN) to restore a coarse face image, which is denoted as the base image where the facial component is automatically generated from the input face image.

Deblurring Face Hallucination +2

Deep Attentive Tracking via Reciprocative Learning

no code implementations NeurIPS 2018 Shi Pu, Yibing Song, Chao Ma, Honggang Zhang, Ming-Hsuan Yang

Visual attention, derived from cognitive neuroscience, facilitates human perception on the most pertinent subset of the sensory data.

Visual Tracking

Deformable Object Tracking with Gated Fusion

no code implementations27 Sep 2018 Wenxi Liu, Yibing Song, Dengsheng Chen, Shengfeng He, Yuanlong Yu, Tao Yan, Gerhard P. Hancke, Rynson W. H. Lau

In addition, we also propose a gated fusion scheme to control how the variations captured by the deformable convolution affect the original appearance.

Object Object Tracking

Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks

1 code implementation CVPR 2018 Jiawei Zhang, Jinshan Pan, Jimmy Ren, Yibing Song, Linchao Bao, Rynson W. H. Lau, Ming-Hsuan Yang

The proposed network is composed of three deep convolutional neural networks (CNNs) and a recurrent neural network (RNN).

Ranked #11 on Deblurring on RealBlur-R (trained on GoPro) (SSIM (sRGB) metric)


VITAL: VIsual Tracking via Adversarial Learning

no code implementations CVPR 2018 Yibing Song, Chao Ma, Xiaohe Wu, Lijun Gong, Linchao Bao, WangMeng Zuo, Chunhua Shen, Rynson Lau, Ming-Hsuan Yang

To augment positive samples, we use a generative network to randomly generate masks, which are applied to adaptively dropout input features to capture a variety of appearance changes.

General Classification Visual Tracking

Stylizing Face Images via Multiple Exemplars

no code implementations28 Aug 2017 Yibing Song, Linchao Bao, Shengfeng He, Qingxiong Yang, Ming-Hsuan Yang

We address the problem of transferring the style of a headshot photo to face images.

Fast Preprocessing for Robust Face Sketch Synthesis

no code implementations1 Aug 2017 Yibing Song, Jiawei Zhang, Linchao Bao, Qingxiong Yang

Exemplar-based face sketch synthesis methods usually meet the challenging problem that input photos are captured in different lighting conditions from training photos.

Face Sketch Synthesis

CREST: Convolutional Residual Learning for Visual Tracking

no code implementations ICCV 2017 Yibing Song, Chao Ma, Lijun Gong, Jiawei Zhang, Rynson Lau, Ming-Hsuan Yang

Our method integrates feature extraction, response map generation as well as model update into the neural networks for an end-to-end training.

Visual Tracking

Cannot find the paper you are looking for? You can Submit a new open access paper.