Search Results for author: Bin Xiao

Found 37 papers, 25 papers with code

A Physical-World Adversarial Attack Against 3D Face Recognition

1 code implementation26 May 2022 YanJie Li, Yiquan Li, Bin Xiao

Then we reverse the 3D adversarial examples to the projector's input to place noises on phase-shift images, which models the process of structured light imaging.

Adversarial Attack Face Recognition

Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks

no code implementations22 Apr 2022 Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Xiyang Dai, Bin Xiao, Jianwei Yang, Haoxuan You, Kai-Wei Chang, Shih-Fu Chang, Lu Yuan

Experiments demonstrate that MAD leads to consistent gains in the low-shot, domain-shifted, and fully-supervised conditions on VCR, SNLI-VE, and VQA, achieving SOTA performance on VCR compared to other single models pretrained with image-text data.

Question Answering Visual Commonsense Reasoning +3

DaViT: Dual Attention Vision Transformers

2 code implementations7 Apr 2022 Mingyu Ding, Bin Xiao, Noel Codella, Ping Luo, Jingdong Wang, Lu Yuan

We show that these two self-attentions complement each other: (i) since each channel token contains an abstract representation of the entire image, the channel attention naturally captures global interactions and representations by taking all spatial positions into account when computing attention scores between channels; (ii) the spatial attention refines the local representations by performing fine-grained interactions across spatial locations, which in turn helps the global information modeling in channel attention.

Ranked #8 on Image Classification on ImageNet (using extra training data)

Image Classification Instance Segmentation +2

Unified Contrastive Learning in Image-Text-Label Space

1 code implementation CVPR 2022 Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Bin Xiao, Ce Liu, Lu Yuan, Jianfeng Gao

Particularly, it attains gains up to 9. 2% and 14. 5% in average on zero-shot recognition benchmarks over the language-image contrastive learning and supervised learning methods, respectively.

Contrastive Learning Image Classification +2

Table Structure Recognition with Conditional Attention

no code implementations8 Mar 2022 Bin Xiao, Murat Simsek, Burak Kantarci, Ala Abu Alkheir

Table Structure Recognition (TSR) problem aims to recognize the structure of a table and transform the unstructured tables into a structured and machine-readable format so that the tabular data can be further analysed by the down-stream tasks, such as semantic modeling and information retrieval.

Information Retrieval

CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks

no code implementations15 Jan 2022 Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Jianwei Yang, Xiyang Dai, Bin Xiao, Haoxuan You, Shih-Fu Chang, Lu Yuan

Experiments demonstrate that our proposed CLIP-TD leads to exceptional gains in the low-shot (up to 51. 9%) and domain-shifted (up to 71. 3%) conditions of VCR, while simultaneously improving performance under standard fully-supervised conditions (up to 2%), achieving state-of-art performance on VCR compared to other single models that are pretrained with image-text data only.

Question Answering Visual Commonsense Reasoning +3

Focal Attention for Long-Range Interactions in Vision Transformers

1 code implementation NeurIPS 2021 Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Xiyang Dai, Bin Xiao, Lu Yuan, Jianfeng Gao

With focal attention, we propose a new variant of Vision Transformer models, called Focal Transformers, which achieve superior performance over the state-of-the-art (SoTA) Vision Transformers on a range of public image classification and object detection benchmarks.

Image Classification object-detection +2

Florence: A New Foundation Model for Computer Vision

1 code implementation22 Nov 2021 Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, JianFeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang

Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications.

Action Classification Action Recognition +12

Generating Unrestricted 3D Adversarial Point Clouds

1 code implementation17 Nov 2021 Xuelong Dai, YanJie Li, Hua Dai, Bin Xiao

The unrestricted adversarial attack loss is incorporated in the special adversarial training of GAN, which enables the generator to generate the adversarial examples to spoof the target network.

Adversarial Attack

MA-CLIP: Towards Modality-Agnostic Contrastive Language-Image Pre-training

no code implementations29 Sep 2021 Haoxuan You, Luowei Zhou, Bin Xiao, Noel C Codella, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan

Large-scale multimodal contrastive pretraining has demonstrated great utility to support high performance in a range of downstream tasks by mapping multiple modalities into a shared embedding space.

Focal Self-attention for Local-Global Interactions in Vision Transformers

3 code implementations1 Jul 2021 Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Xiyang Dai, Bin Xiao, Lu Yuan, Jianfeng Gao

With focal self-attention, we propose a new variant of Vision Transformer models, called Focal Transformer, which achieves superior performance over the state-of-the-art vision Transformers on a range of public image classification and object detection benchmarks.

Image Classification Instance Segmentation +3

Long-term Cross Adversarial Training: A Robust Meta-learning Method for Few-shot Classification Tasks

1 code implementation ICML Workshop AML 2021 Fan Liu, Shuyu Zhao, Xuelong Dai, Bin Xiao

Although adversarial training (AT) methods such as Adversarial Query (AQ) can improve the adversarially robust performance of meta-learning models, AT is still computationally expensive training.

Adversarial Robustness Classification +1

Dynamic Head: Unifying Object Detection Heads with Attentions

3 code implementations CVPR 2021 Xiyang Dai, Yinpeng Chen, Bin Xiao, Dongdong Chen, Mengchen Liu, Lu Yuan, Lei Zhang

In this paper, we present a novel dynamic head framework to unify object detection heads with attentions.

Ranked #7 on Object Detection on COCO minival (using extra training data)

object-detection Object Detection

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression

1 code implementation CVPR 2021 Zigang Geng, Ke Sun, Bin Xiao, Zhaoxiang Zhang, Jingdong Wang

Our motivation is that regressing keypoint positions accurately needs to learn representations that focus on the keypoint regions.

Keypoint Detection

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

3 code implementations ICCV 2021 Pengchuan Zhang, Xiyang Dai, Jianwei Yang, Bin Xiao, Lu Yuan, Lei Zhang, Jianfeng Gao

This paper presents a new Vision Transformer (ViT) architecture Multi-Scale Vision Longformer, which significantly enhances the ViT of \cite{dosovitskiy2020image} for encoding high-resolution images using two techniques.

Image Classification Instance Segmentation +3

CvT: Introducing Convolutions to Vision Transformers

10 code implementations ICCV 2021 Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang

We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.

Ranked #2 on Image Classification on Flowers-102 (using extra training data)

Image Classification

Consistent Instance Classification for Unsupervised Representation Learning

no code implementations1 Jan 2021 Depu Meng, Zigang Geng, Zhirong Wu, Bin Xiao, Houqiang Li, Jingdong Wang

The proposed consistent instance classification (ConIC) approach simultaneously optimizes the classification loss and an additional consistency loss explicitly penalizing the feature dissimilarity between the augmented views from the same instance.

Classification General Classification +1

Reality Transform Adversarial Generators for Image Splicing Forgery Detection and Localization

no code implementations ICCV 2021 Xiuli Bi, Zhipeng Zhang, Bin Xiao

For detecting the tampered regions, a forgery localization generator GM is proposed based on a multi-decoder-single-task strategy.

Style Transfer

DTMNet: A Discrete Tchebichef Moments-Based Deep Neural Network for Multi-Focus Image Fusion

no code implementations ICCV 2021 Bin Xiao, Haifeng Wu, Xiuli Bi

The proposed DTMNet is an end-to-end deep neural network with only one convolutional layer and three fully connected layers.

Color-related Local Binary Pattern: A Learned Local Descriptor for Color Image Recognition

no code implementations11 Dec 2020 Bin Xiao, Tao Geng, Xiuli Bi, Weisheng Li

In this paper, a color-related local binary pattern (cLBP) which learns the dominant patterns from the decoded LBP is proposed for color images recognition.

D-Unet: A Dual-encoder U-Net for Image Splicing Forgery Detection and Localization

no code implementations3 Dec 2020 Bo Liu, Ranglei Wu, Xiuli Bi, Bin Xiao, Weisheng Li, Guoyin Wang, Xinbo Gao

The unfixed encoder autonomously learns the image fingerprints that differentiate between the tampered and non-tampered regions, whereas the fixed encoder intentionally provides the direction information that assists the learning and detection of the network.

Proxy Network for Few Shot Learning

1 code implementation9 Sep 2020 Bin Xiao, Chien-Liang Liu, Wen-Hoar Hsaio

We conclude that the success of metric-learning based approaches lies in the data embedding, the representative of each class, and the distance metric.

Few-Shot Learning Metric Learning

RRU-Net: The Ringed Residual U-Net for Image Splicing Forgery Detection

1 code implementation cvpr 2019 workshop 2019 Xiuli Bi, Yang Wei, Bin Xiao, Weisheng Li

The core idea of the RRU-Net is to strengthen the learning way of CNN, which is inspired by the recall and the consolidation mechanism of the human brain and implemented by the propagation and the feedback process of the residual in CNN.

Deep High-Resolution Representation Learning for Visual Recognition

33 code implementations20 Aug 2019 Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, Bin Xiao

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.

 Ranked #1 on Object Detection on COCO test-dev (Hardware Burden metric)

Instance Segmentation object-detection +5

Deep High-Resolution Representation Learning for Human Pose Estimation

34 code implementations CVPR 2019 Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang

We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel.

Instance Segmentation Multi-Person Pose Estimation +3

Simple Baselines for Human Pose Estimation and Tracking

18 code implementations ECCV 2018 Bin Xiao, Haiping Wu, Yichen Wei

There has been significant progress on pose estimation and increasing interests on pose tracking in recent years.

Pose Tracking

Interleaved Group Convolutions

no code implementations ICCV 2017 Ting Zhang, Guo-Jun Qi, Bin Xiao, Jingdong Wang

The main point lies in a novel building block, a pair of two successive interleaved group convolutions: primary group convolution and secondary group convolution.

Interleaved Group Convolutions for Deep Neural Networks

2 code implementations10 Jul 2017 Ting Zhang, Guo-Jun Qi, Bin Xiao, Jingdong Wang

The main point lies in a novel building block, a pair of two successive interleaved group convolutions: primary group convolution and secondary group convolution.

Cannot find the paper you are looking for? You can Submit a new open access paper.