Search Results for author: Song Bai

Found 73 papers, 43 papers with code

MOSE: A New Dataset for Video Object Segmentation in Complex Scenes

no code implementations3 Feb 2023 Henghui Ding, Chang Liu, Shuting He, Xudong Jiang, Philip H. S. Torr, Song Bai

However, since the target objects in these existing datasets are usually relatively salient, dominant, and isolated, VOS under complex scenes has rarely been studied.

Semantic Segmentation Video Object Segmentation +1

PV3D: A 3D Generative Model for Portrait Video Generation

no code implementations13 Dec 2022 Eric Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Wenqing Zhang, Song Bai, Jiashi Feng, Mike Zheng Shou

While some prior works have applied such image GANs to unconditional 2D portrait video generation and static 3D portrait synthesis, there are few works successfully extending GANs for generating 3D-aware portrait videos.

Video Generation

LUMix: Improving Mixup by Better Modelling Label Uncertainty

no code implementations29 Nov 2022 Shuyang Sun, Jie-Neng Chen, Ruifei He, Alan Yuille, Philip Torr, Song Bai

LUMix is simple as it can be implemented in just a few lines of code and can be universally applied to any deep networks \eg CNNs and Vision Transformers, with minimal computational cost.

Data Augmentation

Language-driven Open-Vocabulary 3D Scene Understanding

1 code implementation29 Nov 2022 Runyu Ding, Jihan Yang, Chuhui Xue, Wenqing Zhang, Song Bai, Xiaojuan Qi

Open-vocabulary scene understanding aims to localize and recognize unseen categories beyond the annotated label space.

Contrastive Learning Instance Segmentation +3

The Runner-up Solution for YouTube-VIS Long Video Challenge 2022

no code implementations18 Nov 2022 Junfeng Wu, Yi Jiang, Qihao Liu, Xiang Bai, Song Bai

This technical report describes our 2nd-place solution for the ECCV 2022 YouTube-VIS Long Video Challenge.

Contrastive Learning Instance Segmentation +2

Holistically-Attracted Wireframe Parsing: From Supervised to Self-Supervised Learning

1 code implementation24 Oct 2022 Nan Xue, Tianfu Wu, Song Bai, Fu-Dong Wang, Gui-Song Xia, Liangpei Zhang, Philip H. S. Torr

At the core is a parsimonious representation that encodes a line segment using a closed-form 4D geometric vector, which enables lifting line segments in wireframe to an end-to-end trainable holistic attraction field that has built-in geometry-awareness, context-awareness and robustness.

Self-Supervised Learning Wireframe Parsing

Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning

1 code implementation1 Oct 2022 Yujun Shi, Jian Liang, Wenqing Zhang, Vincent Y. F. Tan, Song Bai

To remedy this problem caused by the data heterogeneity, we propose {\sc FedDecorr}, a novel method that can effectively mitigate dimensional collapse in federated learning.

Federated Learning

Runner-Up Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: Cropped Word Recognition

no code implementations4 Aug 2022 Zhangzi Zhu, Yu Hao, Wenqing Zhang, Chuhui Xue, Song Bai

This report presents our 2nd place solution to ECCV 2022 challenge on Out-of-Vocabulary Scene Text Understanding (OOV-ST) : Cropped Word Recognition.

Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation

no code implementations29 Jul 2022 Qihao Liu, Yi Zhang, Song Bai, Alan Yuille

Inspired by the remarkable ability of humans to infer occluded joints from visible cues, we develop a method to explicitly model this process that significantly improves bottom-up multi-person human pose estimation with or without occlusions.

3D Human Pose Estimation Data Augmentation

Contextual Text Block Detection towards Scene Text Understanding

no code implementations26 Jul 2022 Chuhui Xue, Jiaxing Huang, Shijian Lu, Changhu Wang, Song Bai

We formulate the new setup by a dual detection task which first detects integral text units and then groups them into a CTB.

text-classification Text Classification +1

In Defense of Online Models for Video Instance Segmentation

1 code implementation21 Jul 2022 Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Association Contrastive Learning +5

CenterNet++ for Object Detection

1 code implementation18 Apr 2022 Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, Qi Tian

Our approach, named CenterNet, detects each object as a triplet keypoints (top-left and bottom-right corners and the center keypoint).

object-detection Object Detection

An Empirical Study of End-to-End Temporal Action Detection

1 code implementation CVPR 2022 Xiaolong Liu, Song Bai, Xiang Bai

Rather than end-to-end learning, most existing methods adopt a head-only learning paradigm, where the video encoder is pre-trained for action classification, and only the detection head upon the encoder is optimized for TAD.

Action Classification Action Detection +2

Fourier Document Restoration for Robust Document Dewarping and Recognition

1 code implementation CVPR 2022 Chuhui Xue, Zichen Tian, Fangneng Zhan, Shijian Lu, Song Bai

State-of-the-art document dewarping techniques learn to predict 3-dimensional information of documents which are prone to errors while dealing with documents with irregular distortions or large variations in depth.

Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting

no code implementations8 Mar 2022 Chuhui Xue, Wenqing Zhang, Yu Hao, Shijian Lu, Philip Torr, Song Bai

Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features, respectively, as well as a visual-textual decoder that models the interaction among textual and visual features for learning effective scene text representations.

Optical Character Recognition Scene Text Detection

SeqFormer: Sequential Transformer for Video Instance Segmentation

2 code implementations15 Dec 2021 Junfeng Wu, Yi Jiang, Song Bai, Wenqing Zhang, Xiang Bai

Nevertheless, we observe that a stand-alone instance query suffices for capturing a time sequence of instances in a video, but attention mechanisms shall be done with each frame independently.

Instance Segmentation Semantic Segmentation +1

Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning

1 code implementation CVPR 2022 Yujun Shi, Kuangqi Zhou, Jian Liang, Zihang Jiang, Jiashi Feng, Philip Torr, Song Bai, Vincent Y. F. Tan

Specifically, we experimentally show that directly encouraging CIL Learner at the initial phase to output similar representations as the model jointly trained on all classes can greatly boost the CIL performance.

class-incremental learning Incremental Learning

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

3 code implementations CVPR 2022 Peize Sun, Jinkun Cao, Yi Jiang, Zehuan Yuan, Song Bai, Kris Kitani, Ping Luo

A typical pipeline for multi-object tracking (MOT) is to use a detector for object localization, and following re-identification (re-ID) for object association.

Association Multi-Object Tracking +3

TransMix: Attend to Mix for Vision Transformers

2 code implementations CVPR 2022 Jie-Neng Chen, Shuyang Sun, Ju He, Philip Torr, Alan Yuille, Song Bai

The confidence of the label will be larger if the corresponding input image is weighted higher by the attention map.

Instance Segmentation object-detection +2

Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge

no code implementations15 Nov 2021 Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

To promote the development of occlusion understanding, we collect a large-scale dataset called OVIS for video instance segmentation in the occluded scenario.

Association Instance Segmentation +4

Contextual Text Detection

no code implementations29 Sep 2021 Chuhui Xue, Jiaxing Huang, Wenqing Zhang, Shijian Lu, Song Bai, Changhu Wang

This paper presents Contextual Text Detection, a new setup that detects contextual text blocks for better understanding of texts in scenes.

CAP-Net: Correspondence-Aware Point-view Fusion Network for 3D Shape Analysis

no code implementations3 Sep 2021 Xinwei He, Silin Cheng, Song Bai, Xiang Bai

The core element of CAP-Net is a module named Correspondence-Aware Fusion (CAF) which integrates the local features of the two modalities based on their correspondence scores.

3D Object Classification Retrieval

PlaneTR: Structure-Guided Transformers for 3D Plane Recovery

no code implementations ICCV 2021 Bin Tan, Nan Xue, Song Bai, Tianfu Wu, Gui-Song Xia

This paper presents a neural network built upon Transformers, namely PlaneTR, to simultaneously detect and reconstruct planes from a single image.

Visual Parser: Representing Part-whole Hierarchies with Transformers

2 code implementations13 Jul 2021 Shuyang Sun, Xiaoyu Yue, Song Bai, Philip Torr

To model the representations of the two levels, we first encode the information from the whole into part vectors through an attention mechanism, then decode the global information within the part vectors back into the whole representation.

Image Classification Instance Segmentation +3

End-to-end Temporal Action Detection with Transformer

1 code implementation18 Jun 2021 Xiaolong Liu, Qimeng Wang, Yao Hu, Xu Tang, Shiwei Zhang, Song Bai, Xiang Bai

Temporal action detection (TAD) aims to determine the semantic label and the temporal interval of every action instance in an untrimmed video.

Action Detection Temporal Action Localization +1

I2C2W: Image-to-Character-to-Word Transformers for Accurate Scene Text Recognition

no code implementations18 May 2021 Chuhui Xue, Jiaxing Huang, Wenqing Zhang, Shijian Lu, Changhu Wang, Song Bai

The first task focuses on image-to-character (I2C) mapping which detects a set of character candidates from images based on different alignments of visual features in an non-sequential way.

Scene Text Recognition

Location-Sensitive Visual Recognition with Cross-IOU Loss

1 code implementation11 Apr 2021 Kaiwen Duan, Lingxi Xie, Honggang Qi, Song Bai, Qingming Huang, Qi Tian

Object detection, instance segmentation, and pose estimation are popular visual recognition tasks which require localizing the object by internal or boundary landmarks.

Instance Segmentation object-detection +3

Anchor-Free Person Search

1 code implementation CVPR 2021 Yichao Yan, Jinpeng Li, Jie Qin, Song Bai, Shengcai Liao, Li Liu, Fan Zhu, Ling Shao

Person search aims to simultaneously localize and identify a query person from realistic, uncropped images, which can be regarded as the unified task of pedestrian detection and person re-identification (re-id).

Pedestrian Detection Person Re-Identification +1

SwiftNet: Real-time Video Object Segmentation

1 code implementation CVPR 2021 Haochen Wang, XiaoLong Jiang, Haibing Ren, Yao Hu, Song Bai

In this work we present SwiftNet for real-time semisupervised video object segmentation (one-shot VOS), which reports 77. 8% J &F and 70 FPS on DAVIS 2017 validation dataset, leading all present solutions in overall accuracy and speed performance.

Semantic Segmentation Semi-Supervised Video Object Segmentation +1

Occluded Video Instance Segmentation: A Benchmark

1 code implementation2 Feb 2021 Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 16. 3, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario.

Association Instance Segmentation +3

Dual Attention GANs for Semantic Image Synthesis

1 code implementation29 Aug 2020 Hao Tang, Song Bai, Nicu Sebe

We also propose two novel modules, i. e., position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention Module (CAM), to capture semantic structure attention in spatial and channel dimensions, respectively.

Image Generation

Bipartite Graph Reasoning GANs for Person Image Generation

1 code implementation10 Aug 2020 Hao Tang, Song Bai, Philip H. S. Torr, Nicu Sebe

We present a novel Bipartite Graph Reasoning GAN (BiGraphGAN) for the challenging person image generation task.

 Ranked #1 on Pose Transfer on Market-1501 (PCKh metric)

Pose Transfer

FedOCR: Communication-Efficient Federated Learning for Scene Text Recognition

no code implementations22 Jul 2020 Wenqing Zhang, Yang Qiu, Song Bai, Rui Zhang, Xiaolin Wei, Xiang Bai

In this paper, we study how to make use of decentralized datasets for training a robust scene text recognizer while keeping them stay on local devices.

Federated Learning Privacy Preserving +1

XingGAN for Person Image Generation

2 code implementations ECCV 2020 Hao Tang, Song Bai, Li Zhang, Philip H. S. Torr, Nicu Sebe

We propose a novel Generative Adversarial Network (XingGAN or CrossingGAN) for person image generation tasks, i. e., translating the pose of a given person to a desired one.

 Ranked #1 on Pose Transfer on Market-1501 (IS metric)

Pose Transfer

Neural Architecture Search for Lightweight Non-Local Networks

2 code implementations CVPR 2020 Yingwei Li, Xiaojie Jin, Jieru Mei, Xiaochen Lian, Linjie Yang, Cihang Xie, Qihang Yu, Yuyin Zhou, Song Bai, Alan Yuille

However, it has been rarely explored to embed the NL blocks in mobile neural networks, mainly due to the following challenges: 1) NL blocks generally have heavy computation cost which makes it difficult to be applied in applications where computational resources are limited, and 2) it is an open problem to discover an optimal configuration to embed NL blocks into mobile neural networks.

Image Classification Neural Architecture Search

Holistically-Attracted Wireframe Parsing

1 code implementation CVPR 2020 Nan Xue, Tianfu Wu, Song Bai, Fu-Dong Wang, Gui-Song Xia, Liangpei Zhang, Philip H. S. Torr

For computing line segment proposals, a novel exact dual representation is proposed which exploits a parsimonious geometric reparameterization for line segments and forms a holistic 4-dimensional attraction field map for an input image.

Line Segment Detection Wireframe Parsing

AutoScale: Learning to Scale for Crowd Counting and Localization

2 code implementations20 Dec 2019 Chenfeng Xu, Dingkang Liang, Yongchao Xu, Song Bai, Wei Zhan, Xiang Bai, Masayoshi Tomizuka

A major issue is that the density map on dense regions usually accumulates density values from a number of nearby Gaussian blobs, yielding different large density values on a small set of pixels.

Crowd Counting Model Optimization

Learning Regional Attraction for Line Segment Detection

no code implementations18 Dec 2019 Nan Xue, Song Bai, Fu-Dong Wang, Gui-Song Xia, Tianfu Wu, Liangpei Zhang, Philip H. S. Torr

Given a line segment map, the proposed regional attraction first establishes the relationship between line segments and regions in the image lattice.

Line Segment Detection

Asymmetric Non-local Neural Networks for Semantic Segmentation

5 code implementations ICCV 2019 Zhen Zhu, Mengde Xu, Song Bai, Tengteng Huang, Xiang Bai

The non-local module works as a particularly useful technique for semantic segmentation while criticized for its prohibitive computation and GPU memory occupation.

Semantic Segmentation

Symmetry-constrained Rectification Network for Scene Text Recognition

no code implementations ICCV 2019 MingKun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, Xiang Bai

Reading text in the wild is a very challenging task due to the diversity of text instances and the complexity of natural scenes.

Scene Text Recognition

View N-gram Network for 3D Object Retrieval

no code implementations ICCV 2019 Xinwei He, Tengteng Huang, Song Bai, Xiang Bai

By doing so, spatial information across multiple views is captured, which helps to learn a discriminative global embedding for each 3D object.

3D Object Retrieval 3D Shape Classification +2

Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting

1 code implementation ICCV 2019 Chenfeng Xu, Kai Qiu, Jianlong Fu, Song Bai, Yongchao Xu, Xiang Bai

Dense crowd counting aims to predict thousands of human instances from an image, by calculating integrals of a density map over image pixels.

Crowd Counting Density Estimation

Re-Ranking via Metric Fusion for Object Retrieval and Person Re-Identification

no code implementations CVPR 2019 Song Bai, Peng Tang, Philip H.S. Torr, Longin Jan Latecki

This work studies the unsupervised re-ranking procedure for object retrieval and person re-identification with a specific concentration on an ensemble of multiple metrics (or similarities).

3D Shape Classification 3D Shape Retrieval +4

CenterNet: Keypoint Triplets for Object Detection

9 code implementations ICCV 2019 Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, Qi Tian

In object detection, keypoint-based approaches often suffer a large number of incorrect object bounding boxes, arguably due to the lack of an additional look into the cropped regions.

object-detection Object Detection

Regional Homogeneity: Towards Learning Transferable Universal Adversarial Perturbations Against Defenses

1 code implementation ECCV 2020 Yingwei Li, Song Bai, Cihang Xie, Zhenyu Liao, Xiaohui Shen, Alan L. Yuille

We observe the property of regional homogeneity in adversarial perturbations and suggest that the defenses are less robust to regionally homogeneous perturbations.

object-detection Object Detection +1

Adversarial Metric Attack and Defense for Person Re-identification

1 code implementation30 Jan 2019 Song Bai, Yingwei Li, Yuyin Zhou, Qizhu Li, Philip H. S. Torr

However, our work observes the extreme vulnerability of existing distance metrics to adversarial examples, generated by simply adding human-imperceptible perturbations to person images.

Adversarial Attack General Classification +1

Hypergraph Convolution and Hypergraph Attention

1 code implementation23 Jan 2019 Song Bai, Feihu Zhang, Philip H. S. Torr

To efficiently learn deep embeddings on the high-order graph-structured data, we introduce two end-to-end trainable operators to the family of graph neural networks, i. e., hypergraph convolution and hypergraph attention.

Node Classification Representation Learning

Learn to Interpret Atari Agents

1 code implementation29 Dec 2018 Zhao Yang, Song Bai, Li Zhang, Philip H. S. Torr

In contrast to previous a-posteriori methods of visualizing DeepRL policies, we propose an end-to-end trainable framework based on Rainbow, a representative Deep Q-Network (DQN) agent.

Decision Making

Learning Transferable Adversarial Examples via Ghost Networks

1 code implementation9 Dec 2018 Yingwei Li, Song Bai, Yuyin Zhou, Cihang Xie, Zhishuai Zhang, Alan Yuille

The critical principle of ghost networks is to apply feature-level perturbations to an existing model to potentially create a huge set of diverse models.

Adversarial Attack

Hard-Aware Point-to-Set Deep Metric for Person Re-identification

1 code implementation ECCV 2018 Rui Yu, Zhiyong Dou, Song Bai, Zhao-Xiang Zhang, Yongchao Xu, Xiang Bai

Person re-identification (re-ID) is a highly challenging task due to large variations of pose, viewpoint, illumination, and occlusion.

Metric Learning Person Re-Identification

PCL: Proposal Cluster Learning for Weakly Supervised Object Detection

4 code implementations9 Jul 2018 Peng Tang, Xinggang Wang, Song Bai, Wei Shen, Xiang Bai, Wenyu Liu, Alan Yuille

The iterative instance classifier refinement is implemented online using multiple streams in convolutional neural networks, where the first is an MIL network and the others are for instance classifier refinement supervised by the preceding one.

Multiple Instance Learning object-detection +2

Semi-Supervised Multi-Organ Segmentation via Deep Multi-Planar Co-Training

no code implementations7 Apr 2018 Yuyin Zhou, Yan Wang, Peng Tang, Song Bai, Wei Shen, Elliot K. Fishman, Alan L. Yuille

In multi-organ segmentation of abdominal CT scans, most existing fully supervised deep learning algorithms require lots of voxel-wise annotations, which are usually difficult, expensive, and slow to obtain.

Image Segmentation Semantic Segmentation

Improving Transferability of Adversarial Examples with Input Diversity

1 code implementation CVPR 2019 Cihang Xie, Zhishuai Zhang, Yuyin Zhou, Song Bai, Jian-Yu Wang, Zhou Ren, Alan Yuille

We hope that our proposed attack strategy can serve as a strong benchmark baseline for evaluating the robustness of networks to adversaries and the effectiveness of different defense methods in the future.

Adversarial Attack Image Classification

Triplet-Center Loss for Multi-View 3D Object Retrieval

1 code implementation CVPR 2018 Xinwei He, Yang Zhou, Zhichao Zhou, Song Bai, Xiang Bai

Most existing 3D object recognition algorithms focus on leveraging the strong discriminative power of deep learning models with softmax loss for the classification of 3D data, while learning discriminative features with deep metric learning for 3D object retrieval is more or less neglected.

3D Object Recognition 3D Object Retrieval +5

Ensemble Diffusion for Retrieval

no code implementations ICCV 2017 Song Bai, Zhichao Zhou, Jingdong Wang, Xiang Bai, Longin Jan Latecki, Qi Tian

This stimulates a great research interest of considering similarity fusion in the framework of diffusion process (i. e., fusion with diffusion) for robust retrieval.

3D Shape Classification 3D Shape Retrieval +2

Scalable Person Re-identification on Supervised Smoothed Manifold

no code implementations CVPR 2017 Song Bai, Xiang Bai, Qi Tian

Most existing person re-identification algorithms either extract robust visual features or learn discriminative metrics for person images.

Person Re-Identification

Multidimensional Scaling on Multiple Input Distance Matrices

no code implementations1 May 2016 Song Bai, Xiang Bai, Longin Jan Latecki, Qi Tian

How to do multidimensional scaling on multiple input distance matrices is still unsolved to our best knowledge.

Deep Learning Representation using Autoencoder for 3D Shape Retrieval

no code implementations25 Sep 2014 Zhuotun Zhu, Xinggang Wang, Song Bai, Cong Yao, Xiang Bai

By combing the global deep learning representation and the local descriptor representation, our method can obtain the state-of-the-art performance on 3D shape retrieval benchmarks.

3D Shape Classification 3D Shape Recognition +5

Cannot find the paper you are looking for? You can Submit a new open access paper.