Two-Stream Networks for Object Segmentation in Videos

no code implementations8 Aug 2022 Hannan Lu, Zhi Tian, Lirong Yang, Haibing Ren, WangMeng Zuo

The compact instance stream effectively improves the segmentation accuracy of the unseen pixels, while fusing two streams with the adaptive routing map leads to an overall performance boost.

Retrieval Semantic Segmentation +2

Target-Driven Structured Transformer Planner for Vision-Language Navigation

1 code implementation19 Jul 2022 Yusheng Zhao, Jinyu Chen, Chen Gao, Wenguan Wang, Lirong Yang, Haibing Ren, Huaxia Xia, Si Liu

Vision-language navigation is the task of directing an embodied agent to navigate in 3D scenes with natural language instructions.

Navigate Vision-Language Navigation

SideRT: A Real-time Pure Transformer Architecture for Single Image Depth Estimation

no code implementations29 Apr 2022 Chang Shu, Ziming Chen, Lei Chen, Kuan Ma, Minghui Wang, Haibing Ren

To the best of our knowledge, this is the first work to show that transformer-based networks can attain state-of-the-art performance in real-time in the single image depth estimation field.

Depth Estimation

3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection

1 code implementation CVPR 2022 Junyu Luo, Jiahui Fu, Xianghao Kong, Chen Gao, Haibing Ren, Hao Shen, Huaxia Xia, Si Liu

3D visual grounding aims to locate the referred target object in 3D point cloud scenes according to a free-form language description.

Visual Grounding

PromptDet: Towards Open-vocabulary Detection using Uncurated Images

2 code implementations30 Mar 2022 Chengjian Feng, Yujie Zhong, Zequn Jie, Xiangxiang Chu, Haibing Ren, Xiaolin Wei, Weidi Xie, Lin Ma

The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations.

Language Modelling

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

8 code implementations NeurIPS 2021 Xiangxiang Chu, Zhi Tian, Yuqing Wang, Bo Zhang, Haibing Ren, Xiaolin Wei, Huaxia Xia, Chunhua Shen

Very recently, a variety of vision transformer architectures for dense prediction tasks have been proposed and they show that the design of spatial attention is critical to their success in these tasks.

Image Classification Semantic Segmentation

SwiftNet: Real-time Video Object Segmentation

1 code implementation CVPR 2021 Haochen Wang, XiaoLong Jiang, Haibing Ren, Yao Hu, Song Bai

In this work we present SwiftNet for real-time semisupervised video object segmentation (one-shot VOS), which reports 77. 8% J &F and 70 FPS on DAVIS 2017 validation dataset, leading all present solutions in overall accuracy and speed performance.

Semantic Segmentation Semi-Supervised Video Object Segmentation +1

