Search Results for author: Tete Xiao

Found 20 papers, 14 papers with code

Segment Anything

18 code implementations • ICCV 2023 • Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick

We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation.

Ranked #2 on Zero-Shot Instance Segmentation on LVIS v1.0 val

Event-based Object Segmentation Image Segmentation +3

124,593

Paper
Code

Real-World Humanoid Locomotion with Reinforcement Learning

no code implementations • 6 Mar 2023 • Ilija Radosavovic, Tete Xiao, Bike Zhang, Trevor Darrell, Jitendra Malik, Koushil Sreenath

Humanoid robots that can autonomously operate in diverse environments have the potential to help address labour shortages in factories, assist elderly at homes, and colonize new planets.

reinforcement-learning

Paper
Add Code

Real-World Robot Learning with Masked Visual Pre-training

1 code implementation • 6 Oct 2022 • Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell

Finally, we train a 307M parameter vision transformer on a massive collection of 4. 5M images from the Internet and egocentric videos, and demonstrate clearly the benefits of scaling visual pre-training for robot learning.

191

Paper
Code

Masked Visual Pre-training for Motor Control

1 code implementation • 11 Mar 2022 • Tete Xiao, Ilija Radosavovic, Trevor Darrell, Jitendra Malik

This paper shows that self-supervised visual pre-training from real-world images is effective for learning motor control tasks from pixels.

191

Paper
Code

Early Convolutions Help Transformers See Better

1 code implementation • NeurIPS 2021 • Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollár, Ross Girshick

To test whether this atypical design choice causes an issue, we analyze the optimization behavior of ViT models with their original patchify stem versus a simple counterpart where we replace the ViT stem by a small number of stacked stride-two 3*3 convolutions.

Paper
Code

Region Similarity Representation Learning

1 code implementation • ICCV 2021 • Tete Xiao, Colorado J Reed, Xiaolong Wang, Kurt Keutzer, Trevor Darrell

We present Region Similarity Representation Learning (ReSim), a new approach to self-supervised representation learning for localization-based tasks such as object detection and segmentation.

Instance Segmentation Object +5

Paper
Code

Learning Cross-Domain Correspondence for Control with Dynamics Cycle-Consistency

1 code implementation • ICLR 2021 • Qiang Zhang, Tete Xiao, Alexei A. Efros, Lerrel Pinto, Xiaolong Wang

We propose \textit{dynamics cycles} that align dynamic robot behavior across two domains using a cycle-consistency constraint.

Friction Imitation Learning +1

Paper
Code

Reducing Class Collapse in Metric Learning with Easy Positive Sampling

no code implementations • 28 Sep 2020 • Elad Levi, Tete Xiao, Xiaolong Wang, Trevor Darrell

We theoretically prove and empirically show that under reasonable noise assumptions, prevalent embedding losses in metric learning, e. g., triplet loss, tend to project all samples of a class with various modes onto a single point in the embedding space, resulting in a class collapse that usually renders the space ill-sorted for classification or retrieval.

Image Retrieval Metric Learning +1

Paper
Add Code

What Should Not Be Contrastive in Contrastive Learning

no code implementations • ICLR 2021 • Tete Xiao, Xiaolong Wang, Alexei A. Efros, Trevor Darrell

Recent self-supervised contrastive methods have been able to produce impressive transferable visual representations by learning to be invariant to different data augmentations.

Contrastive Learning

Paper
Add Code

Rethinking preventing class-collapsing in metric learning with margin-based losses

no code implementations • ICCV 2021 • Elad Levi, Tete Xiao, Xiaolong Wang, Trevor Darrell

We theoretically prove and empirically show that under reasonable noise assumptions, margin-based losses tend to project all samples of a class with various modes onto a single point in the embedding space, resulting in a class collapse that usually renders the space ill-sorted for classification or retrieval.

Image Retrieval Metric Learning +1

Paper
Add Code

Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks

1 code implementation • CVPR 2020 • Joanna Materzynska, Tete Xiao, Roei Herzig, Huijuan Xu, Xiaolong Wang, Trevor Darrell

Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations.

Action Recognition Object

138

Paper
Code

Reasoning About Human-Object Interactions Through Dual Attention Networks

no code implementations • ICCV 2019 • Tete Xiao, Quanfu Fan, Dan Gutfreund, Mathew Monfort, Aude Oliva, Bolei Zhou

The model not only finds when an action is happening and which object is being manipulated, but also identifies which part of the object is being interacted with.

Human-Object Interaction Detection Object

Paper
Add Code

Acquisition of Localization Confidence for Accurate Object Detection

4 code implementations • ECCV 2018 • Borui Jiang, Ruixuan Luo, Jiayuan Mao, Tete Xiao, Yuning Jiang

The network acquires this confidence of localization, which improves the NMS procedure by preserving accurately localized bounding boxes.

Ranked #172 on Object Detection on COCO test-dev

General Classification Object +3

767

Paper
Code

Unified Perceptual Parsing for Scene Understanding

18 code implementations • ECCV 2018 • Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun

In this paper, we study a new task called Unified Perceptual Parsing, which requires the machine vision systems to recognize as many visual concepts as possible from a given image.

Ranked #88 on Semantic Segmentation on ADE20K val

Scene Understanding Semantic Segmentation

7,374

Paper
Code

Learning Visually-Grounded Semantics from Contrastive Adversarial Samples

1 code implementation • COLING 2018 • Haoyue Shi, Jiayuan Mao, Tete Xiao, Yuning Jiang, Jian Sun

Begin with an insightful adversarial attack on VSE embeddings, we show the limitation of current frameworks and image-text datasets (e. g., MS-COCO) both quantitatively and qualitatively.

Adversarial Attack Image Captioning

Paper
Code

CrowdHuman: A Benchmark for Detecting Human in a Crowd

1 code implementation • 30 Apr 2018 • Shuai Shao, Zijian Zhao, Boxun Li, Tete Xiao, Gang Yu, Xiangyu Zhang, Jian Sun

There are a total of $470K$ human instances from the train and validation subsets, and $~22. 6$ persons per image, with various kinds of occlusions in the dataset.

Ranked #7 on Pedestrian Detection on Caltech (using extra training data)

Human Detection Object Detection +1

Paper
Code

Repulsion Loss: Detecting Pedestrians in a Crowd

2 code implementations • CVPR 2018 • Xinlong Wang, Tete Xiao, Yuning Jiang, Shuai Shao, Jian Sun, Chunhua Shen

In this paper, we first explore how a state-of-the-art pedestrian detector is harmed by crowd occlusion via experimentation, providing insights into the crowd occlusion problem.

Ranked #9 on Pedestrian Detection on Caltech (using extra training data)

Pedestrian Detection regression

233

Paper
Code

MegDet: A Large Mini-Batch Object Detector

6 code implementations • CVPR 2018 • Chao Peng, Tete Xiao, Zeming Li, Yuning Jiang, Xiangyu Zhang, Kai Jia, Gang Yu, Jian Sun

The improvements in recent CNN-based object detection works, from R-CNN [11], Fast/Faster R-CNN [10, 31] to recent Mask R-CNN [14] and RetinaNet [24], mainly come from new network, new framework, or novel loss design.

Object object-detection +1