Search Results for author: Song Bai

Found 84 papers, 47 papers with code

Hypergraph Convolution and Hypergraph Attention

1 code implementation • 23 Jan 2019 • Song Bai, Feihu Zhang, Philip H. S. Torr

To efficiently learn deep embeddings on the high-order graph-structured data, we introduce two end-to-end trainable operators to the family of graph neural networks, i. e., hypergraph convolution and hypergraph attention.

Node Classification Representation Learning

20,054

Paper
Code

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

3 code implementations • CVPR 2022 • Peize Sun, Jinkun Cao, Yi Jiang, Zehuan Yuan, Song Bai, Kris Kitani, Ping Luo

A typical pipeline for multi-object tracking (MOT) is to use a detector for object localization, and following re-identification (re-ID) for object association.

Multi-Object Tracking Object +3

12,022

Paper
Code

Asymmetric Non-local Neural Networks for Semantic Segmentation

5 code implementations • ICCV 2019 • Zhen Zhu, Mengde Xu, Song Bai, Tengteng Huang, Xiang Bai

The non-local module works as a particularly useful technique for semantic segmentation while criticized for its prohibitive computation and GPU memory occupation.

Ranked #15 on Semantic Segmentation on COCO-Stuff test

Segmentation Semantic Segmentation

8,218

Paper
Code

CenterNet: Keypoint Triplets for Object Detection

20 code implementations • ICCV 2019 • Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, Qi Tian

In object detection, keypoint-based approaches often suffer a large number of incorrect object bounding boxes, arguably due to the lack of an additional look into the cropped regions.

Ranked #116 on Object Detection on COCO test-dev

Object object-detection +1

1,851

Paper
Code

DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing

3 code implementations • 26 Jun 2023 • Yujun Shi, Chuhui Xue, Jun Hao Liew, Jiachun Pan, Hanshu Yan, Wenqing Zhang, Vincent Y. F. Tan, Song Bai

In this work, we extend this editing framework to diffusion models and propose a novel approach DragDiffusion.

1,009

Paper
Code

General Object Foundation Model for Images and Videos at Scale

1 code implementation • 14 Dec 2023 • Junfeng Wu, Yi Jiang, Qihao Liu, Zehuan Yuan, Xiang Bai, Song Bai

We present GLEE in this work, an object-level foundation model for locating and identifying objects in images and videos.

Ranked #1 on Referring Expression Segmentation on Refer-YouTube-VOS (2021 public validation) (using extra training data)

Long-tail Video Object Segmentation Multi-Object Tracking +8

868

Paper
Code

SeqFormer: Sequential Transformer for Video Instance Segmentation

2 code implementations • 15 Dec 2021 • Junfeng Wu, Yi Jiang, Song Bai, Wenqing Zhang, Xiang Bai

Nevertheless, we observe that a stand-alone instance query suffices for capturing a time sequence of instances in a video, but attention mechanisms shall be done with each frame independently.

Ranked #2 on Video Instance Segmentation on HQ-YTVIS

Instance Segmentation Semantic Segmentation +1

591

Paper
Code

In Defense of Online Models for Video Instance Segmentation

1 code implementation • 21 Jul 2022 • Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Ranked #9 on Video Instance Segmentation on YouTube-VIS validation (using extra training data)

Contrastive Learning Instance Segmentation +5

591

Paper
Code

InstMove: Instance Motion for Object-centric Video Segmentation

1 code implementation • CVPR 2023 • Qihao Liu, Junfeng Wu, Yi Jiang, Xiang Bai, Alan Yuille, Song Bai

A common solution is to use optical flow to provide motion information, but essentially it only considers pixel-level motion, which still relies on appearance similarity and hence is often inaccurate under occlusion and fast movement.

Object Optical Flow Estimation +3

591

Paper
Code

TransMix: Attend to Mix for Vision Transformers

2 code implementations • CVPR 2022 • Jie-Neng Chen, Shuyang Sun, Ju He, Philip Torr, Alan Yuille, Song Bai

The confidence of the label will be larger if the corresponding input image is weighted higher by the attention map.

Instance Segmentation object-detection +3

566

Paper
Code

MOSE: A New Dataset for Video Object Segmentation in Complex Scenes

1 code implementation • ICCV 2023 • Henghui Ding, Chang Liu, Shuting He, Xudong Jiang, Philip H. S. Torr, Song Bai

However, since the target objects in these existing datasets are usually relatively salient, dominant, and isolated, VOS under complex scenes has rarely been studied.

Object Segmentation +3

291

Paper
Code

Learning Attraction Field Representation for Robust Line Segment Detection

1 code implementation • CVPR 2019 • Nan Xue, Song Bai, Fu-Dong Wang, Gui-Song Xia, Tianfu Wu, Liangpei Zhang

In experiments, our method is tested on the WireFrame dataset and the YorkUrban dataset with state-of-the-art performance obtained.

Line Segment Detection Semantic Segmentation

289

Paper
Code

Holistically-Attracted Wireframe Parsing

1 code implementation • CVPR 2020 • Nan Xue, Tianfu Wu, Song Bai, Fu-Dong Wang, Gui-Song Xia, Liangpei Zhang, Philip H. S. Torr

For computing line segment proposals, a novel exact dual representation is proposed which exploits a parsimonious geometric reparameterization for line segments and forms a holistic 4-dimensional attraction field map for an input image.

Ranked #4 on Line Segment Detection on York Urban Dataset

Line Segment Detection Wireframe Parsing

276

Paper
Code

Holistically-Attracted Wireframe Parsing: From Supervised to Self-Supervised Learning

1 code implementation • 24 Oct 2022 • Nan Xue, Tianfu Wu, Song Bai, Fu-Dong Wang, Gui-Song Xia, Liangpei Zhang, Philip H. S. Torr

This article presents Holistically-Attracted Wireframe Parsing (HAWP), a method for geometric analysis of 2D images containing wireframes formed by line segments and junctions.

Self-Supervised Learning Wireframe Parsing

276

Paper
Code

PCL: Proposal Cluster Learning for Weakly Supervised Object Detection

4 code implementations • 9 Jul 2018 • Peng Tang, Xinggang Wang, Song Bai, Wei Shen, Xiang Bai, Wenyu Liu, Alan Yuille

The iterative instance classifier refinement is implemented online using multiple streams in convolutional neural networks, where the first is an MIL network and the others are for instance classifier refinement supervised by the preceding one.

Ranked #1 on Weakly Supervised Object Detection on ImageNet

Multiple Instance Learning Object +3

245

Paper
Code

XingGAN for Person Image Generation

2 code implementations • ECCV 2020 • Hao Tang, Song Bai, Li Zhang, Philip H. S. Torr, Nicu Sebe

We propose a novel Generative Adversarial Network (XingGAN or CrossingGAN) for person image generation tasks, i. e., translating the pose of a given person to a desired one.

Ranked #1 on Pose Transfer on Market-1501 (IS metric)

Generative Adversarial Network Pose Transfer

229

Paper
Code

PLA: Language-Driven Open-Vocabulary 3D Scene Understanding

1 code implementation • CVPR 2023 • Runyu Ding, Jihan Yang, Chuhui Xue, Wenqing Zhang, Song Bai, Xiaojuan Qi

Open-vocabulary scene understanding aims to localize and recognize unseen categories beyond the annotated label space.

Ranked #2 on 3D Open-Vocabulary Instance Segmentation on S3DIS

3D Open-Vocabulary Instance Segmentation Contrastive Learning +4

201

Paper
Code

Corner Proposal Network for Anchor-free, Two-stage Object Detection

1 code implementation • ECCV 2020 • Kaiwen Duan, Lingxi Xie, Honggang Qi, Song Bai, Qingming Huang, Qi Tian

On the MS-COCO dataset, CPN achieves an AP of 49. 2% which is competitive among state-of-the-art object detection methods.

Ranked #94 on Object Detection on COCO test-dev

Computational Efficiency Object +3

193

Paper
Code

CenterNet++ for Object Detection

2 code implementations • 18 Apr 2022 • Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, Qi Tian

Our approach, named CenterNet, detects each object as a triplet keypoints (top-left and bottom-right corners and the center keypoint).

Ranked #35 on Object Detection on COCO test-dev

Object object-detection +1

177

Paper
Code

SRFormer: Permuted Self-Attention for Single Image Super-Resolution

1 code implementation • ICCV 2023 • Yupeng Zhou, Zhen Li, Chun-Le Guo, Song Bai, Ming-Ming Cheng, Qibin Hou

Previous works have shown that increasing the window size for Transformer-based image super-resolution models (e. g., SwinIR) can significantly improve the model performance but the computation overhead is also considerable.

Image Super-Resolution

168

Paper
Code

Anchor-Free Person Search

1 code implementation • CVPR 2021 • Yichao Yan, Jinpeng Li, Jie Qin, Song Bai, Shengcai Liao, Li Liu, Fan Zhu, Ling Shao

Person search aims to simultaneously localize and identify a query person from realistic, uncropped images, which can be regarded as the unified task of pedestrian detection and person re-identification (re-id).

Ranked #10 on Person Search on CUHK-SYSU

Pedestrian Detection Person Re-Identification +1

166

Paper
Code

Is synthetic data from generative models ready for image recognition?

1 code implementation • 14 Oct 2022 • Ruifei He, Shuyang Sun, Xin Yu, Chuhui Xue, Wenqing Zhang, Philip Torr, Song Bai, Xiaojuan Qi

Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images.

Text-to-Image Generation Transfer Learning

162

Paper
Code

Improving Transferability of Adversarial Examples with Input Diversity

2 code implementations • CVPR 2019 • Cihang Xie, Zhishuai Zhang, Yuyin Zhou, Song Bai, Jian-Yu Wang, Zhou Ren, Alan Yuille

We hope that our proposed attack strategy can serve as a strong benchmark baseline for evaluating the robustness of networks to adversaries and the effectiveness of different defense methods in the future.

Adversarial Attack Image Classification

159

Paper
Code

Location-Sensitive Visual Recognition with Cross-IOU Loss

1 code implementation • 11 Apr 2021 • Kaiwen Duan, Lingxi Xie, Honggang Qi, Song Bai, Qingming Huang, Qi Tian

Object detection, instance segmentation, and pose estimation are popular visual recognition tasks which require localizing the object by internal or boundary landmarks.

Ranked #56 on Object Detection on COCO test-dev

2D Human Pose Estimation Instance Segmentation +5

154

Paper
Code

End-to-end Temporal Action Detection with Transformer

1 code implementation • 18 Jun 2021 • Xiaolong Liu, Qimeng Wang, Yao Hu, Xu Tang, Shiwei Zhang, Song Bai, Xiang Bai

Temporal action detection (TAD) aims to determine the semantic label and the temporal interval of every action instance in an untrimmed video.

Ranked #8 on Temporal Action Localization on HACS

Action Detection Temporal Action Localization +1

139

Paper
Code

Bipartite Graph Reasoning GANs for Person Image Generation

1 code implementation • 10 Aug 2020 • Hao Tang, Song Bai, Philip H. S. Torr, Nicu Sebe

We present a novel Bipartite Graph Reasoning GAN (BiGraphGAN) for the challenging person image generation task.

Ranked #1 on Pose Transfer on Market-1501 (PCKh metric)

Pose Transfer

128

Paper
Code

Visual Parser: Representing Part-whole Hierarchies with Transformers

2 code implementations • 13 Jul 2021 • Shuyang Sun, Xiaoyu Yue, Song Bai, Philip Torr

To model the representations of the two levels, we first encode the information from the whole into part vectors through an attention mechanism, then decode the global information within the part vectors back into the whole representation.

Ranked #313 on Image Classification on ImageNet

Image Classification Instance Segmentation +3

124

Paper
Code

Anchor Diffusion for Unsupervised Video Object Segmentation

1 code implementation • ICCV 2019 • Zhao Yang, Qiang Wang, Luca Bertinetto, Weiming Hu, Song Bai, Philip H. S. Torr

Unsupervised video object segmentation has often been tackled by methods based on recurrent neural networks and optical flow.

Ranked #18 on Unsupervised Video Object Segmentation on DAVIS 2016 val

Image Segmentation Object +4

116

Paper
Code

Dual Attention GANs for Semantic Image Synthesis

1 code implementation • 29 Aug 2020 • Hao Tang, Song Bai, Nicu Sebe

We also propose two novel modules, i. e., position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention Module (CAM), to capture semantic structure attention in spatial and channel dimensions, respectively.

Image Generation Position

110

Paper
Code

Neural Architecture Search for Lightweight Non-Local Networks

2 code implementations • CVPR 2020 • Yingwei Li, Xiaojie Jin, Jieru Mei, Xiaochen Lian, Linjie Yang, Cihang Xie, Qihang Yu, Yuyin Zhou, Song Bai, Alan Yuille

However, it has been rarely explored to embed the NL blocks in mobile neural networks, mainly due to the following challenges: 1) NL blocks generally have heavy computation cost which makes it difficult to be applied in applications where computational resources are limited, and 2) it is an open problem to discover an optimal configuration to embed NL blocks into mobile neural networks.

Ranked #60 on Neural Architecture Search on ImageNet

Image Classification Neural Architecture Search

105

Paper
Code

SwiftNet: Real-time Video Object Segmentation

1 code implementation • CVPR 2021 • Haochen Wang, XiaoLong Jiang, Haibing Ren, Yao Hu, Song Bai

In this work we present SwiftNet for real-time semisupervised video object segmentation (one-shot VOS), which reports 77. 8% J &F and 70 FPS on DAVIS 2017 validation dataset, leading all present solutions in overall accuracy and speed performance.

Object Segmentation +3

Paper
Code

An Empirical Study of End-to-End Temporal Action Detection

1 code implementation • CVPR 2022 • Xiaolong Liu, Song Bai, Xiang Bai

Rather than end-to-end learning, most existing methods adopt a head-only learning paradigm, where the video encoder is pre-trained for action classification, and only the detection head upon the encoder is optimized for TAD.

Ranked #17 on Temporal Action Localization on THUMOS’14

Action Classification Action Detection +2

Paper
Code

Occluded Video Instance Segmentation: A Benchmark

2 code implementations • 2 Feb 2021 • Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 16. 3, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario.

Ranked #39 on Video Instance Segmentation on OVIS validation

Instance Segmentation Segmentation +3

Paper
Code

Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability

1 code implementation • CVPR 2022 • Ruifei He, Shuyang Sun, Jihan Yang, Song Bai, Xiaojuan Qi

Large-scale pre-training has been proven to be crucial for various computer vision tasks.

Knowledge Distillation

Paper
Code

Learning Transferable Adversarial Examples via Ghost Networks

1 code implementation • 9 Dec 2018 • Yingwei Li, Song Bai, Yuyin Zhou, Cihang Xie, Zhishuai Zhang, Alan Yuille

The critical principle of ghost networks is to apply feature-level perturbations to an existing model to potentially create a huge set of diverse models.

Adversarial Attack

Paper
Code

Multi-shot Temporal Event Localization: a Benchmark

1 code implementation • CVPR 2021 • Xiaolong Liu, Yao Hu, Song Bai, Fei Ding, Xiang Bai, Philip H. S. Torr

Current developments in temporal event or action localization usually target actions captured by a single camera.

Ranked #2 on Temporal Action Localization on MUSES

Temporal Action Localization

Paper
Code

Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning

2 code implementations • 1 Oct 2022 • Yujun Shi, Jian Liang, Wenqing Zhang, Vincent Y. F. Tan, Song Bai

To remedy this problem caused by the data heterogeneity, we propose {\sc FedDecorr}, a novel method that can effectively mitigate dimensional collapse in federated learning.

Federated Learning

Paper
Code

Triplet-Center Loss for Multi-View 3D Object Retrieval

1 code implementation • CVPR 2018 • Xinwei He, Yang Zhou, Zhichao Zhou, Song Bai, Xiang Bai

Most existing 3D object recognition algorithms focus on leveraging the strong discriminative power of deep learning models with softmax loss for the classification of 3D data, while learning discriminative features with deep metric learning for 3D object retrieval is more or less neglected.

3D Object Recognition 3D Object Retrieval +6

Paper
Code

Regional Homogeneity: Towards Learning Transferable Universal Adversarial Perturbations Against Defenses

1 code implementation • ECCV 2020 • Yingwei Li, Song Bai, Cihang Xie, Zhenyu Liao, Xiaohui Shen, Alan L. Yuille

We observe the property of regional homogeneity in adversarial perturbations and suggest that the defenses are less robust to regionally homogeneous perturbations.

object-detection Object Detection +1

Paper
Code

Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence

1 code implementation • 18 Nov 2021 • Xiang Bai, Hanchen Wang, Liya Ma, Yongchao Xu, Jiefeng Gan, Ziwei Fan, Fan Yang, Ke Ma, Jiehua Yang, Song Bai, Chang Shu, Xinyu Zou, Renhao Huang, Changzheng Zhang, Xiaowu Liu, Dandan Tu, Chuou Xu, Wenqing Zhang, Xi Wang, Anguo Chen, Yu Zeng, Dehua Yang, Ming-Wei Wang, Nagaraj Holalkere, Neil J. Halin, Ihab R. Kamel, Jia Wu, Xuehua Peng, Xiang Wang, Jianbo Shao, Pattanasak Mongkolwat, Jianjun Zhang, Weiyang Liu, Michael Roberts, Zhongzhao Teng, Lucian Beer, Lorena Escudero Sanchez, Evis Sala, Daniel Rubin, Adrian Weller, Joan Lasenby, Chuangsheng Zheng, Jianming Wang, Zhen Li, Carola-Bibiane Schönlieb, Tian Xia

Artificial intelligence (AI) provides a promising substitution for streamlining COVID-19 diagnoses.

COVID-19 Diagnosis Federated Learning +2

Paper
Code

Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning

1 code implementation • CVPR 2022 • Yujun Shi, Kuangqi Zhou, Jian Liang, Zihang Jiang, Jiashi Feng, Philip Torr, Song Bai, Vincent Y. F. Tan

Specifically, we experimentally show that directly encouraging CIL Learner at the initial phase to output similar representations as the model jointly trained on all classes can greatly boost the CIL performance.

Class Incremental Learning Incremental Learning

Paper
Code

Adversarial Metric Attack and Defense for Person Re-identification

1 code implementation • 30 Jan 2019 • Song Bai, Yingwei Li, Yuyin Zhou, Qizhu Li, Philip H. S. Torr

However, our work observes the extreme vulnerability of existing distance metrics to adversarial examples, generated by simply adding human-imperceptible perturbations to person images.

Adversarial Attack Benchmarking +2

Paper
Code

AutoScale: Learning to Scale for Crowd Counting and Localization

2 code implementations • 20 Dec 2019 • Chenfeng Xu, Dingkang Liang, Yongchao Xu, Song Bai, Wei Zhan, Xiang Bai, Masayoshi Tomizuka

A major issue is that the density map on dense regions usually accumulates density values from a number of nearby Gaussian blobs, yielding different large density values on a small set of pixels.

Crowd Counting Model Optimization

Paper
Code

Learn to Interpret Atari Agents

1 code implementation • 29 Dec 2018 • Zhao Yang, Song Bai, Li Zhang, Philip H. S. Torr

Deep reinforcement learning (DeepRL) agents surpass human-level performance in many tasks.

Decision Making

Paper
Code

Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation

1 code implementation • 15 Nov 2021 • Anirudh S Chakravarthy, Won-Dong Jang, Zudi Lin, Donglai Wei, Song Bai, Hanspeter Pfister

Motivated by this, we propose a video instance segmentation method that alleviates the problem due to missing detections.

Ranked #40 on Video Instance Segmentation on YouTube-VIS validation

Instance Segmentation Segmentation +2

Paper
Code

Hard-Aware Point-to-Set Deep Metric for Person Re-identification

1 code implementation • ECCV 2018 • Rui Yu, Zhiyong Dou, Song Bai, Zhao-Xiang Zhang, Yongchao Xu, Xiang Bai

Person re-identification (re-ID) is a highly challenging task due to large variations of pose, viewpoint, illumination, and occlusion.

Metric Learning Person Re-Identification

Paper
Code

Fourier Document Restoration for Robust Document Dewarping and Recognition

1 code implementation • CVPR 2022 • Chuhui Xue, Zichen Tian, Fangneng Zhan, Shijian Lu, Song Bai

State-of-the-art document dewarping techniques learn to predict 3-dimensional information of documents which are prone to errors while dealing with documents with irregular distortions or large variations in depth.

Paper
Code

Semi-Supervised Multi-Organ Segmentation via Deep Multi-Planar Co-Training

no code implementations • 7 Apr 2018 • Yuyin Zhou, Yan Wang, Peng Tang, Song Bai, Wei Shen, Elliot K. Fishman, Alan L. Yuille

In multi-organ segmentation of abdominal CT scans, most existing fully supervised deep learning algorithms require lots of voxel-wise annotations, which are usually difficult, expensive, and slow to obtain.

Image Segmentation Organ Segmentation +2

Paper
Add Code

Multidimensional Scaling on Multiple Input Distance Matrices

no code implementations • 1 May 2016 • Song Bai, Xiang Bai, Longin Jan Latecki, Qi Tian

How to do multidimensional scaling on multiple input distance matrices is still unsolved to our best knowledge.

Paper
Add Code

Divide and Fuse: A Re-ranking Approach for Person Re-identification

no code implementations • 11 Aug 2017 • Rui Yu, Zhichao Zhou, Song Bai, Xiang Bai

Finally, the new features from the same image are fused into one vector for re-ranking.

Person Re-Identification Re-Ranking

Paper
Add Code

GIFT: A Real-time and Scalable 3D Shape Search Engine

no code implementations • CVPR 2016 • Song Bai, Xiang Bai, Zhichao Zhou, Zhaoxiang Zhang, Longin Jan Latecki

We name the proposed 3D shape search engine, which combines GPU acceleration and Inverted File Twice, as GIFT.

3D Shape Classification 3D Shape Retrieval +2

Paper
Add Code

Scalable Person Re-identification on Supervised Smoothed Manifold

no code implementations • CVPR 2017 • Song Bai, Xiang Bai, Qi Tian

Most existing person re-identification algorithms either extract robust visual features or learn discriminative metrics for person images.

Ranked #100 on Person Re-Identification on Market-1501

Person Re-Identification

Paper
Add Code

Deep Learning Representation using Autoencoder for 3D Shape Retrieval

no code implementations • 25 Sep 2014 • Zhuotun Zhu, Xinggang Wang, Song Bai, Cong Yao, Xiang Bai

By combing the global deep learning representation and the local descriptor representation, our method can obtain the state-of-the-art performance on 3D shape retrieval benchmarks.

3D Shape Classification 3D Shape Recognition +5

Paper
Add Code

Ensemble Diffusion for Retrieval

no code implementations • ICCV 2017 • Song Bai, Zhichao Zhou, Jingdong Wang, Xiang Bai, Longin Jan Latecki, Qi Tian

This stimulates a great research interest of considering similarity fusion in the framework of diffusion process (i. e., fusion with diffusion) for robust retrieval.

3D Shape Classification 3D Shape Retrieval +2

Paper
Add Code

Prior-aware Neural Network for Partially-Supervised Multi-Organ Segmentation

no code implementations • ICCV 2019 • Yuyin Zhou, Zhe Li, Song Bai, Chong Wang, Xinlei Chen, Mei Han, Elliot Fishman, Alan Yuille

Accurate multi-organ abdominal CT segmentation is essential to many clinical applications such as computer-aided intervention.

Medical Image Segmentation Organ Segmentation +2

Paper
Add Code

Re-Ranking via Metric Fusion for Object Retrieval and Person Re-Identification

no code implementations • CVPR 2019 • Song Bai, Peng Tang, Philip H.S. Torr, Longin Jan Latecki

This work studies the unsupervised re-ranking procedure for object retrieval and person re-identification with a specific concentration on an ensemble of multiple metrics (or similarities).

3D Shape Classification 3D Shape Retrieval +4

Paper
Add Code

Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting

no code implementations • ICCV 2019 • Chenfeng Xu, Kai Qiu, Jianlong Fu, Song Bai, Yongchao Xu, Xiang Bai

Dense crowd counting aims to predict thousands of human instances from an image, by calculating integrals of a density map over image pixels.

Crowd Counting Density Estimation

Paper
Add Code

Symmetry-constrained Rectification Network for Scene Text Recognition

no code implementations • ICCV 2019 • MingKun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, Xiang Bai

Reading text in the wild is a very challenging task due to the diversity of text instances and the complexity of natural scenes.

Scene Text Recognition

Paper
Add Code

View N-gram Network for 3D Object Retrieval

no code implementations • ICCV 2019 • Xinwei He, Tengteng Huang, Song Bai, Xiang Bai

By doing so, spatial information across multiple views is captured, which helps to learn a discriminative global embedding for each 3D object.

3D Object Retrieval 3D Shape Classification +3

Paper
Add Code

Learning Regional Attraction for Line Segment Detection

no code implementations • 18 Dec 2019 • Nan Xue, Song Bai, Fu-Dong Wang, Gui-Song Xia, Tianfu Wu, Liangpei Zhang, Philip H. S. Torr

Given a line segment map, the proposed regional attraction first establishes the relationship between line segments and regions in the image lattice.

Line Segment Detection

Paper
Add Code

FedOCR: Communication-Efficient Federated Learning for Scene Text Recognition

no code implementations • 22 Jul 2020 • Wenqing Zhang, Yang Qiu, Song Bai, Rui Zhang, Xiaolin Wei, Xiang Bai

In this paper, we study how to make use of decentralized datasets for training a robust scene text recognizer while keeping them stay on local devices.

Federated Learning Privacy Preserving +1

Paper
Add Code

Reinforced Wasserstein Training for Severity-Aware Semantic Segmentation in Autonomous Driving

no code implementations • 11 Aug 2020 • Xiaofeng Liu, Yimeng Zhang, Xiongchang Liu, Song Bai, Site Li, Jane You

The ground metric of Wasserstein distance can be pre-defined following the experience on a specific task.

Autonomous Driving Semantic Segmentation

Paper
Add Code

Importance-Aware Semantic Segmentation in Self-Driving with Discrete Wasserstein Training

no code implementations • 21 Oct 2020 • Xiaofeng Liu, Yuzhuo Han, Song Bai, Yi Ge, Tianxing Wang, Xu Han, Site Li, Jane You, Ju Lu

However, the cross entropy loss can not take the different importance of each class in an self-driving system into account.

Segmentation Self-Driving Cars +1

Paper
Add Code

I2C2W: Image-to-Character-to-Word Transformers for Accurate Scene Text Recognition

no code implementations • 18 May 2021 • Chuhui Xue, Jiaxing Huang, Wenqing Zhang, Shijian Lu, Changhu Wang, Song Bai

The first task focuses on image-to-character (I2C) mapping which detects a set of character candidates from images based on different alignments of visual features in an non-sequential way.

Scene Text Recognition

Paper
Add Code

PlaneTR: Structure-Guided Transformers for 3D Plane Recovery

no code implementations • ICCV 2021 • Bin Tan, Nan Xue, Song Bai, Tianfu Wu, Gui-Song Xia

This paper presents a neural network built upon Transformers, namely PlaneTR, to simultaneously detect and reconstruct planes from a single image.

Paper
Add Code

LATFormer: Locality-Aware Point-View Fusion Transformer for 3D Shape Recognition

no code implementations • 3 Sep 2021 • Xinwei He, Silin Cheng, Dingkang Liang, Song Bai, Xi Wang, Yingying Zhu

To investigate this, we propose a novel Locality-Aware Point-View Fusion Transformer (LATFormer) for 3D shape retrieval and classification.

3D Object Classification 3D Object Retrieval +3

Paper
Add Code

Contextual Text Detection

no code implementations • 29 Sep 2021 • Chuhui Xue, Jiaxing Huang, Wenqing Zhang, Shijian Lu, Song Bai, Changhu Wang

This paper presents Contextual Text Detection, a new setup that detects contextual text blocks for better understanding of texts in scenes.

Text Detection

Paper
Add Code

Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge

no code implementations • 15 Nov 2021 • Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

To promote the development of occlusion understanding, we collect a large-scale dataset called OVIS for video instance segmentation in the occluded scenario.

Instance Segmentation Object Recognition +3

Paper
Add Code

Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting

no code implementations • 8 Mar 2022 • Chuhui Xue, Wenqing Zhang, Yu Hao, Shijian Lu, Philip Torr, Song Bai

Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features, respectively, as well as a visual-textual decoder that models the interaction among textual and visual features for learning effective scene text representations.

Optical Character Recognition Optical Character Recognition (OCR) +2

Paper
Add Code

YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset

no code implementations • CVPR 2022 • Donglai Wei, Siddhant Kharbanda, Sarthak Arora, Roshan Roy, Nishant Jain, Akash Palrecha, Tanav Shah, Shray Mathur, Ritik Mathur, Abhijay Kemkar, Anirudh Chakravarthy, Zudi Lin, Won-Dong Jang, Yansong Tang, Song Bai, James Tompkin, Philip H.S. Torr, Hanspeter Pfister

Many video understanding tasks require analyzing multi-shot videos, but existing datasets for video object segmentation (VOS) only consider single-shot videos.

Management Segmentation +5

Paper
Add Code

Contextual Text Block Detection towards Scene Text Understanding

no code implementations • 26 Jul 2022 • Chuhui Xue, Jiaxing Huang, Shijian Lu, Changhu Wang, Song Bai

We formulate the new setup by a dual detection task which first detects integral text units and then groups them into a CTB.

text-classification Text Classification +2

Paper
Add Code

Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation

no code implementations • 29 Jul 2022 • Qihao Liu, Yi Zhang, Song Bai, Alan Yuille

Inspired by the remarkable ability of humans to infer occluded joints from visible cues, we develop a method to explicitly model this process that significantly improves bottom-up multi-person human pose estimation with or without occlusions.

Ranked #10 on 3D Multi-Person Pose Estimation (absolute) on MuPoTS-3D

3D Human Pose Estimation 3D Multi-Person Pose Estimation (absolute) +2

Paper
Add Code

Runner-Up Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: Cropped Word Recognition

no code implementations • 4 Aug 2022 • Zhangzi Zhu, Yu Hao, Wenqing Zhang, Chuhui Xue, Song Bai

This report presents our 2nd place solution to ECCV 2022 challenge on Out-of-Vocabulary Scene Text Understanding (OOV-ST) : Cropped Word Recognition.

Paper
Add Code

1st Place Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: End-to-End Recognition of Out of Vocabulary Words

no code implementations • 1 Sep 2022 • Zhangzi Zhu, Chuhui Xue, Yu Hao, Wenqing Zhang, Song Bai

Our oCLIP-based model achieves 28. 59\% in h-mean which ranks 1st in end-to-end OOV word recognition track of OOV Challenge in ECCV2022 TiE Workshop.

Autonomous Driving Scene Text Recognition +1

Paper
Add Code

The Runner-up Solution for YouTube-VIS Long Video Challenge 2022

no code implementations • 18 Nov 2022 • Junfeng Wu, Yi Jiang, Qihao Liu, Xiang Bai, Song Bai

This technical report describes our 2nd-place solution for the ECCV 2022 YouTube-VIS Long Video Challenge.

Contrastive Learning Instance Segmentation +2

Paper
Add Code

LUMix: Improving Mixup by Better Modelling Label Uncertainty

no code implementations • 29 Nov 2022 • Shuyang Sun, Jie-Neng Chen, Ruifei He, Alan Yuille, Philip Torr, Song Bai

LUMix is simple as it can be implemented in just a few lines of code and can be universally applied to any deep networks \eg CNNs and Vision Transformers, with minimal computational cost.

Data Augmentation

Paper
Add Code

PV3D: A 3D Generative Model for Portrait Video Generation

no code implementations • 13 Dec 2022 • Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Wenqing Zhang, Song Bai, Jiashi Feng, Mike Zheng Shou

While some prior works have applied such image GANs to unconditional 2D portrait video generation and static 3D portrait synthesis, there are few works successfully extending GANs for generating 3D-aware portrait videos.

Video Generation

Paper
Add Code

Discovering Failure Modes of Text-guided Diffusion Models via Adversarial Search

no code implementations • 1 Jun 2023 • Qihao Liu, Adam Kortylewski, Yutong Bai, Song Bai, Alan Yuille

(2) We find regions in the latent space that lead to distorted images independent of the text prompt, suggesting that parts of the latent space are not well-structured.

Adversarial Attack Efficient Exploration +1

Paper
Add Code

Lowis3D: Language-Driven Open-World Instance-Level 3D Scene Understanding

no code implementations • 1 Aug 2023 • Runyu Ding, Jihan Yang, Chuhui Xue, Wenqing Zhang, Song Bai, Xiaojuan Qi

To address this challenge, we propose to harness pre-trained vision-language (VL) foundation models that encode extensive knowledge from image-text pairs to generate captions for multi-view images of 3D scenes.

Ranked #3 on 3D Open-Vocabulary Instance Segmentation on S3DIS

3D Open-Vocabulary Instance Segmentation Instance Segmentation +4

Paper
Add Code

Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks

no code implementations • 13 Aug 2023 • David Junhao Zhang, Mutian Xu, Chuhui Xue, Wenqing Zhang, Xiaoguang Han, Song Bai, Mike Zheng Shou

Despite the rapid advancement of unsupervised learning in visual representation, it requires training on large-scale datasets that demand costly data collection, and pose additional challenges due to concerns regarding data privacy.

Contrastive Learning Image Classification +2

Paper
Add Code

Dataset Condensation via Generative Model

no code implementations • 14 Sep 2023 • David Junhao Zhang, Heng Wang, Chuhui Xue, Rui Yan, Wenqing Zhang, Song Bai, Mike Zheng Shou

Dataset condensation aims to condense a large dataset with a lot of training samples into a small set.

Dataset Condensation

Paper
Add Code

Learning to Holistically Detect Bridges from Large-Size VHR Remote Sensing Imagery

no code implementations • 5 Dec 2023 • Yansheng Li, Junwei Luo, Yongjun Zhang, Yihua Tan, Jin-Gang Yu, Song Bai

Therefore, to ensure the visibility and integrity of bridges, it is essential to perform holistic bridge detection in large-size very-high-resolution (VHR) RSIs.

object-detection Object Detection

Paper
Add Code

Progress and Prospects in 3D Generative AI: A Technical Overview including 3D human

no code implementations • 5 Jan 2024 • Song Bai, Jie Li

Since the year 2023 an abundant amount of research papers has emerged in the domain of 3D generation.

3D Generation

Paper
Add Code

Debiasing Text-to-Image Diffusion Models

no code implementations • 22 Feb 2024 • Ruifei He, Chuhui Xue, Haoru Tan, Wenqing Zhang, Yingchen Yu, Song Bai, Xiaojuan Qi

Despite its simplicity, we show that IDA shows efficiency and fast convergence in resolving the social bias in TTI diffusion models.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.