Search Results for author: Tao Kong

Found 39 papers, 19 papers with code

Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation

1 code implementation • 20 Dec 2023 • Hongtao Wu, Ya Jing, Chilam Cheang, Guangzeng Chen, Jiafeng Xu, Xinghang Li, Minghuan Liu, Hang Li, Tao Kong

In this paper, we extend the scope of this effectiveness by showing that visual robot manipulation can significantly benefit from large-scale video generative pre-training.

Ranked #2 on Zero-shot Generalization on CALVIN (using extra training data)

Robot Manipulation Zero-shot Generalization

Paper
Code

Vision-Language Foundation Models as Effective Robot Imitators

no code implementations • 2 Nov 2023 • Xinghang Li, Minghuan Liu, Hanbo Zhang, Cunjun Yu, Jie Xu, Hongtao Wu, Chilam Cheang, Ya Jing, Weinan Zhang, Huaping Liu, Hang Li, Tao Kong

We believe RoboFlamingo has the potential to be a cost-effective and easy-to-use solution for robotics manipulation, empowering everyone with the ability to fine-tune their own robotics policy.

Imitation Learning

Paper
Add Code

InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions

1 code implementation • 18 Oct 2023 • Hanbo Zhang, Jie Xu, Yuchen Mo, Tao Kong

Ambiguity is ubiquitous in human communication.

Benchmarking Visual Grounding

Paper
Code

MOMA-Force: Visual-Force Imitation for Real-World Mobile Manipulation

no code implementations • 7 Aug 2023 • Taozheng Yang, Ya Jing, Hongtao Wu, Jiafeng Xu, Kuankuan Sima, Guangzeng Chen, Qie Sima, Tao Kong

In this paper, we present a novel method for mobile manipulators to perform multiple contact-rich manipulation tasks.

Imitation Learning Representation Learning

Paper
Add Code

Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods

no code implementations • 7 Aug 2023 • Ya Jing, Xuelin Zhu, Xingbin Liu, Qie Sima, Taozheng Yang, Yunhai Feng, Tao Kong

However, the recipes of visual pre-training for robot manipulation tasks are yet to be built.

Contrastive Learning Robot Manipulation +1

Paper
Add Code

ClickSeg: 3D Instance Segmentation with Click-Level Weak Annotations

no code implementations • 19 Jul 2023 • Leyao Liu, Tao Kong, Minzhao Zhu, Jiashuo Fan, Lu Fang

Instead of directly using the model inference way, i. e., mean-shift clustering, to generate the pseudo labels, we propose to use k-means with fixed initial seeds: the annotated points.

3D Instance Segmentation Clustering +3

Paper
Add Code

What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?

2 code implementations • 5 Jul 2023 • Yan Zeng, Hanbo Zhang, Jiani Zheng, Jiangnan Xia, Guoqiang Wei, Yang Wei, Yuchen Zhang, Tao Kong

However, the performance of these models heavily relies on design choices such as network structures, training data, and training strategies, and these choices have not been extensively discussed in the literature, making it difficult to quantify progress in this field.

Instruction Following Language Modelling

227

Paper
Code

Learning to Explore Informative Trajectories and Samples for Embodied Perception

no code implementations • 20 Mar 2023 • Ya Jing, Tao Kong

Unlike static perception methods trained on pre-collected images, the embodied agent can move around in the environment and obtain images of objects from any viewpoints.

Paper
Add Code

Towards Unifying Reference Expression Generation and Comprehension

1 code implementation • 24 Oct 2022 • Duo Zheng, Tao Kong, Ya Jing, Jiaan Wang, Xiaojie Wang

Additionally, IRTF could generate pseudo input regions for the REC task to enable a uniform way for sharing the identical representation space across the REC and REG.

Language Modelling Masked Language Modeling +1

Paper
Code

Generative Category-Level Shape and Pose Estimation with Semantic Primitives

1 code implementation • 3 Oct 2022 • Guanglin Li, Yifeng Li, Zhichao Ye, Qihang Zhang, Tao Kong, Zhaopeng Cui, Guofeng Zhang

Then, by using a SIM(3)-invariant shape descriptor, we gracefully decouple the shape and pose of an object, thus supporting latent shape optimization of target objects in arbitrary poses.

Ranked #2 on 6D Pose Estimation using RGBD on REAL275

6D Pose Estimation using RGBD Object

Paper
Code

Exploring Target Representations for Masked Autoencoders

1 code implementation • 8 Sep 2022 • Xingbin Liu, Jinghao Zhou, Tao Kong, Xianming Lin, Rongrong Ji

Masked autoencoders have become popular training paradigms for self-supervised visual representation learning.

Ranked #6 on Self-Supervised Image Classification on ImageNet (finetuned)

Instance Segmentation Knowledge Distillation +6

Paper
Code

3D Part Assembly Generation with Instance Encoded Transformer

no code implementations • 5 Jul 2022 • Rufeng Zhang, Tao Kong, WeiHao Wang, Xuan Han, Mingyu You

It is desirable to enable robots capable of automatic assembly.

Pose Estimation Relational Reasoning

Paper
Add Code

Learning Design and Construction with Varying-Sized Materials via Prioritized Memory Resets

1 code implementation • 12 Apr 2022 • Yunfei Li, Tao Kong, Lei LI, Yi Wu

Can a robot autonomously learn to design and construct a bridge from varying-sized blocks without a blueprint?

Motion Planning

Paper
Code

Navigating to Objects in Unseen Environments by Distance Prediction

no code implementations • 8 Feb 2022 • Minzhao Zhu, Binglei Zhao, Tao Kong

With the estimated distance map, the agent could simultaneously explore the environment and navigate to the target objects based on a simple human-designed strategy.

Navigate Object

Paper
Add Code

iBOT: Image BERT Pre-Training with Online Tokenizer

1 code implementation • 15 Nov 2021 • Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, Tao Kong

We present a self-supervised framework iBOT that can perform masked prediction with an online tokenizer.

Ranked #1 on Unsupervised Image Classification on ImageNet

Instance Segmentation Language Modelling +6

619

Paper
Code

Self-Supervised Learning by Estimating Twin Class Distributions

2 code implementations • 14 Oct 2021 • Feng Wang, Tao Kong, Rufeng Zhang, Huaping Liu, Hang Li

To solve this problem, we propose to maximize the mutual information between the input and the class predictions.

Ranked #1 on Image Classification on Oxford-IIIT Pet Dataset

Fine-Grained Image Classification Representation Learning +5

Paper
Code

Image BERT Pre-training with Online Tokenizer

no code implementations • ICLR 2022 • Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, Tao Kong

The success of language Transformers is primarily attributed to the pretext task of masked language modeling (MLM), where texts are first tokenized into semantically meaningful pieces.

Image Classification Instance Segmentation +5

Paper
Add Code

ICM-3D: Instantiated Category Modeling for 3D Instance Segmentation

no code implementations • 26 Aug 2021 • Ruihang Chu, Yukang Chen, Tao Kong, Lu Qi, Lei LI

Separating 3D point clouds into individual instances is an important task for 3D vision.

3D Instance Segmentation Semantic Segmentation

Paper
Add Code

Simultaneous Semantic and Collision Learning for 6-DoF Grasp Pose Estimation

no code implementations • 5 Aug 2021 • Yiming Li, Tao Kong, Ruihang Chu, Yifeng Li, Peng Wang, Lei LI

In a unified framework, we jointly predict the feasible 6-DoF grasp poses, instance semantic segmentation, and collision information.

Multi-Task Learning Pose Estimation +1

Paper
Add Code

Learning to Design and Construct Bridge without Blueprint

no code implementations • 5 Aug 2021 • Yunfei Li, Tao Kong, Lei LI, Yifeng Li, Yi Wu

In this task, the robot needs to first design a feasible bridge architecture for arbitrarily wide cliffs and then manipulate the blocks reliably to construct a stable bridge according to the proposed design.

Motion Planning

Paper
Add Code

SOLO: A Simple Framework for Instance Segmentation

no code implementations • 30 Jun 2021 • Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, Lei LI

Besides instance segmentation, our method yields state-of-the-art results in object detection (from our mask byproduct) and panoptic segmentation.

Image Matting Instance Segmentation +4

Paper
Add Code

Adversarial Option-Aware Hierarchical Imitation Learning

1 code implementation • 10 Jun 2021 • Mingxuan Jing, Wenbing Huang, Fuchun Sun, Xiaojian Ma, Tao Kong, Chuang Gan, Lei LI

In particular, we propose an Expectation-Maximization(EM)-style algorithm: an E-step that samples the options of expert conditioned on the current learned policy, and an M-step that updates the low- and high-level policies of agent simultaneously to minimize the newly proposed option-occupancy measurement between the expert and the agent.

Imitation Learning

Paper
Code

Scale-aware Automatic Augmentation for Object Detection

1 code implementation • CVPR 2021 • Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei LI, Jiaya Jia

We propose Scale-aware AutoAug to learn data augmentation policies for object detection.

Data Augmentation Instance Segmentation +5

196

Paper
Code

Locate then Segment: A Strong Pipeline for Referring Image Segmentation

no code implementations • CVPR 2021 • Ya Jing, Tao Kong, Wei Wang, Liang Wang, Lei LI, Tieniu Tan

Referring image segmentation aims to segment the objects referred by a natural language expression.

Ranked #5 on Generalized Referring Expression Segmentation on gRefCOCO

Generalized Referring Expression Segmentation Image Segmentation +2

Paper
Add Code

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

6 code implementations • CVPR 2021 • Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei LI, Zehuan Yuan, Changhu Wang, Ping Luo

In our method, however, a fixed sparse set of learned object proposals, total length of $N$, are provided to object recognition head to perform classification and location.

Ranked #5 on 2D Object Detection on CeyMo

Object object-detection +2

27,708

Paper
Code

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

6 code implementations • CVPR 2021 • Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, Lei LI

Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only <1% slower), but demonstrates consistently superior performance when transferring to downstream dense prediction tasks including object detection, semantic segmentation and instance segmentation; and outperforms the state-of-the-art methods by a large margin.

Contrastive Learning Image Classification +7

3,078

Paper
Code

SOLOv2: Dynamic and Fast Instance Segmentation

18 code implementations • NeurIPS 2020 • Xinlong Wang, Rufeng Zhang, Tao Kong, Lei LI, Chunhua Shen

Importantly, we take one step further by dynamically learning the mask head of the object segmenter such that the mask head is conditioned on the location.

Ranked #10 on Real-time Instance Segmentation on MSCOCO

object-detection Object Detection +4

27,708

Paper
Code

FoveaBox: Beyound Anchor-based Object Detection

no code implementations • ICLR 2020 • Tao Kong, Fuchun Sun, Huaping Liu, Yuning Jiang, Lei LI, Jianbo Shi

While almost all state-of-the-art object detectors utilize predefined anchors to enumerate possible locations, scales and aspect ratios for the search of the objects, their performance and generalization ability are also limited to the design of anchors.

Object object-detection +1

Paper
Add Code

SOLO: Segmenting Objects by Locations

24 code implementations • ECCV 2020 • Xinlong Wang, Tao Kong, Chunhua Shen, Yuning Jiang, Lei LI

We present a new, embarrassingly simple approach to instance segmentation in images.

Ranked #67 on Instance Segmentation on COCO test-dev

Clustering General Classification +3

27,708

Paper
Code

Task-Aware Monocular Depth Estimation for 3D Object Detection

1 code implementation • 17 Sep 2019 • Xinlong Wang, Wei Yin, Tao Kong, Yuning Jiang, Lei LI, Chunhua Shen

In this paper, we first analyse the data distributions and interaction of foreground and background, then propose the foreground-background separated monocular depth estimation (ForeSeE) method, to estimate the foreground depth and background depth using separate optimization objectives and depth decoders.

3D Object Detection 3D Object Recognition +4

Paper
Code

Deep Point-wise Prediction for Action Temporal Proposal

1 code implementation • 17 Sep 2019 • Luxuan Li, Tao Kong, Fuchun Sun, Huaping Liu

Detecting actions in videos is an important yet challenging task.

Action Recognition Temporal Action Proposal Generation

Paper
Code

Attention-based Transfer Learning for Brain-computer Interface

no code implementations • 25 Apr 2019 • Chuanqi Tan, Fuchun Sun, Tao Kong, Bin Fang, Wenchang Zhang

Different functional areas of the human brain play different roles in brain activity, which has not been paid sufficient research attention in the brain-computer interface (BCI) field.

Brain Computer Interface Classification +3

Paper
Add Code

FoveaBox: Beyond Anchor-based Object Detector

7 code implementations • 8 Apr 2019 • Tao Kong, Fuchun Sun, Huaping Liu, Yuning Jiang, Lei LI, Jianbo Shi

In FoveaBox, an instance is assigned to adjacent feature levels to make the model more accurate. We demonstrate its effectiveness on standard benchmarks and report extensive experimental analysis.

Ranked #89 on Object Detection on COCO test-dev (APM metric)

Object object-detection +1

27,708

Paper
Code

Consistent Optimization for Single-Shot Object Detection

no code implementations • 19 Jan 2019 • Tao Kong, Fuchun Sun, Huaping Liu, Yuning Jiang, Jianbo Shi

We present consistent optimization for single stage object detection.

Object object-detection +1

Paper
Add Code

Zoom-In-to-Check: Boosting Video Interpolation via Instance-level Discrimination

no code implementations • CVPR 2019 • Liangzhe Yuan, Yibo Chen, Hantian Liu, Tao Kong, Jianbo Shi

We propose a light-weight video frame interpolation algorithm.

Image Generation object-detection +3

Paper
Add Code

Deep Feature Pyramid Reconfiguration for Object Detection

no code implementations • ECCV 2018 • Tao Kong, Fuchun Sun, Wenbing Huang, Huaping Liu

In this paper, we begin by investigating current feature pyramids solutions, and then reformulate the feature pyramid construction as the feature reconfiguration process.

Object object-detection +1

Paper
Add Code

A Survey on Deep Transfer Learning

no code implementations • 6 Aug 2018 • Chuanqi Tan, Fuchun Sun, Tao Kong, Wenchang Zhang, Chao Yang, Chunfang Liu

As a new classification platform, deep learning has recently received increasing attention from researchers and has been successfully applied to many domains.

General Classification Transfer Learning

Paper
Add Code

RON: Reverse Connection with Objectness Prior Networks for Object Detection

1 code implementation • CVPR 2017 • Tao Kong, Fuchun Sun, Anbang Yao, Huaping Liu, Ming Lu, Yurong Chen

To address (a), we design the reverse connection, which enables the network to detect objects on multi-levels of CNNs.

Object object-detection +2

355

Paper
Code

HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection

no code implementations • CVPR 2016 • Tao Kong, Anbang Yao, Yurong Chen, Fuchun Sun

Almost all of the current top-performing object detection networks employ region proposals to guide the search for object instances.

Object object-detection +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.