Search Results for author: Baining Guo

Found 18 papers, 11 papers with code

iCAR: Bridging Image Classification and Image-text Alignment for Visual Recognition

no code implementations22 Apr 2022 Yixuan Wei, Yue Cao, Zheng Zhang, Zhuliang Yao, Zhenda Xie, Han Hu, Baining Guo

Second, we convert the image classification problem from learning parametric category classifier weights to learning a text encoder as a meta network to generate category classifier weights.

Action Recognition Classification +6

Protecting Celebrities from DeepFake with Identity Consistency Transformer

1 code implementation2 Mar 2022 Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Ting Zhang, Weiming Zhang, Nenghai Yu, Dong Chen, Fang Wen, Baining Guo

In this work we propose Identity Consistency Transformer, a novel face forgery detection method that focuses on high-level semantics, specifically identity information, and detecting a suspect face by finding identity inconsistency in inner and outer face regions.

Face Swapping

StyleSwin: Transformer-based GAN for High-resolution Image Generation

1 code implementation arXiv 2021 BoWen Zhang, Shuyang Gu, Bo Zhang, Jianmin Bao, Dong Chen, Fang Wen, Yong Wang, Baining Guo

To this end, we believe that local attention is crucial to strike the balance between computational efficiency and modeling capacity.

Image Generation

VirtualCube: An Immersive 3D Video Communication System

no code implementations13 Dec 2021 Yizhong Zhang, Jiaolong Yang, Zhen Liu, Ruicheng Wang, Guojun Chen, Xin Tong, Baining Guo

The VirtualCube system is a 3D video conference system that attempts to overcome some limitations of conventional technologies.

Depth Estimation

Vector Quantized Diffusion Model for Text-to-Image Synthesis

2 code implementations29 Nov 2021 Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, Baining Guo

Our experiments indicate that the VQ-Diffusion model with the reparameterization is fifteen times faster than traditional AR methods while achieving a better image quality.

 Ranked #1 on Text-to-Image Generation on Oxford 102 Flowers (using extra training data)

Denoising Text to image generation +1

Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions

no code implementations19 Nov 2021 Hongwei Xue, Tiankai Hang, Yanhong Zeng, Yuchong Sun, Bei Liu, Huan Yang, Jianlong Fu, Baining Guo

To enable VL pre-training, we jointly optimize the HD-VILA model by a hybrid Transformer that learns rich spatiotemporal features, and a multimodal Transformer that enforces interactions of the learned video features with diversified texts.

Super-Resolution Text to Video Retrieval +1

Swin Transformer V2: Scaling Up Capacity and Resolution

7 code implementations18 Nov 2021 Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo

Three main techniques are proposed: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) A log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.

Action Classification Image Classification +3

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

4 code implementations1 Jul 2021 Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Lu Yuan, Dong Chen, Baining Guo

By further pretraining on the larger dataset ImageNet-21K, we achieve 87. 5% Top-1 accuracy on ImageNet-1K and high segmentation performance on ADE20K with 55. 7 mIoU.

Image Classification Semantic Segmentation

Aggregated Contextual Transformations for High-Resolution Image Inpainting

2 code implementations3 Apr 2021 Yanhong Zeng, Jianlong Fu, Hongyang Chao, Baining Guo

For improving texture synthesis, we enhance the discriminator of AOT-GAN by training it with a tailored mask-prediction task.

Image Inpainting Texture Synthesis

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

53 code implementations ICCV 2021 Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo

This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision.

Ranked #3 on Semantic Segmentation on FoodSeg103 (using extra training data)

Image Classification Instance Segmentation +2

Identity-Driven DeepFake Detection

no code implementations7 Dec 2020 Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Dong Chen, Fang Wen, Baining Guo

Our approach takes as input the suspect image/video as well as the target identity information (a reference image or video).

DeepFake Detection Face Swapping

Learning Texture Transformer Network for Image Super-Resolution

1 code implementation CVPR 2020 Fuzhi Yang, Huan Yang, Jianlong Fu, Hongtao Lu, Baining Guo

In this paper, we propose a novel Texture Transformer Network for Image Super-Resolution (TTSR), in which the LR and Ref images are formulated as queries and keys in a transformer, respectively.

Hard Attention Image Generation +2

Face X-ray for More General Face Forgery Detection

3 code implementations CVPR 2020 Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, Baining Guo

For this reason, face X-ray provides an effective way for detecting forgery generated by most existing face manipulation algorithms.

DeepFake Detection Face Swapping

Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting

2 code implementations CVPR 2019 Yanhong Zeng, Jianlong Fu, Hongyang Chao, Baining Guo

As the missing content can be filled by attention transfer from deep to shallow in a pyramid fashion, both visual and semantic coherence for image inpainting can be ensured.

Image Inpainting

Compressing Neural Networks using the Variational Information Bottelneck

1 code implementation ICML 2018 Bin Dai, Chen Zhu, Baining Guo, David Wipf

Neural networks can be compressed to reduce memory and computational requirements, or to increase accuracy by facilitating the use of a larger base architecture.

Unsupervised Extraction of Video Highlights Via Robust Recurrent Auto-encoders

no code implementations ICCV 2015 Huan Yang, Baoyuan Wang, Stephen Lin, David Wipf, Minyi Guo, Baining Guo

With the growing popularity of short-form video sharing platforms such as \em{Instagram} and \em{Vine}, there has been an increasing need for techniques that automatically extract highlights from video.

Orientational Pyramid Matching for Recognizing Indoor Scenes

no code implementations CVPR 2014 Lingxi Xie, Jingdong Wang, Baining Guo, Bo Zhang, Qi Tian

The novelty lies in that OPM uses the 3D orientations to form the pyramid and produce the pooling regions, which is unlike SPM that uses the spatial positions to form the pyramid.

General Classification Scene Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.