24 code implementations • ICCV 2023 • Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick
We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation.
Ranked #2 on Zero-Shot Instance Segmentation on LVIS v1.0 val
no code implementations • 6 Mar 2023 • Ilija Radosavovic, Tete Xiao, Bike Zhang, Trevor Darrell, Jitendra Malik, Koushil Sreenath
Humanoid robots that can autonomously operate in diverse environments have the potential to help address labour shortages in factories, assist elderly at homes, and colonize new planets.
1 code implementation • 6 Oct 2022 • Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell
Finally, we train a 307M parameter vision transformer on a massive collection of 4. 5M images from the Internet and egocentric videos, and demonstrate clearly the benefits of scaling visual pre-training for robot learning.
1 code implementation • 11 Mar 2022 • Tete Xiao, Ilija Radosavovic, Trevor Darrell, Jitendra Malik
This paper shows that self-supervised visual pre-training from real-world images is effective for learning motor control tasks from pixels.
1 code implementation • NeurIPS 2021 • Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollár, Ross Girshick
To test whether this atypical design choice causes an issue, we analyze the optimization behavior of ViT models with their original patchify stem versus a simple counterpart where we replace the ViT stem by a small number of stacked stride-two 3*3 convolutions.
1 code implementation • ICCV 2021 • Tete Xiao, Colorado J Reed, Xiaolong Wang, Kurt Keutzer, Trevor Darrell
We present Region Similarity Representation Learning (ReSim), a new approach to self-supervised representation learning for localization-based tasks such as object detection and segmentation.
1 code implementation • ICLR 2021 • Qiang Zhang, Tete Xiao, Alexei A. Efros, Lerrel Pinto, Xiaolong Wang
We propose \textit{dynamics cycles} that align dynamic robot behavior across two domains using a cycle-consistency constraint.
no code implementations • 28 Sep 2020 • Elad Levi, Tete Xiao, Xiaolong Wang, Trevor Darrell
We theoretically prove and empirically show that under reasonable noise assumptions, prevalent embedding losses in metric learning, e. g., triplet loss, tend to project all samples of a class with various modes onto a single point in the embedding space, resulting in a class collapse that usually renders the space ill-sorted for classification or retrieval.
no code implementations • ICLR 2021 • Tete Xiao, Xiaolong Wang, Alexei A. Efros, Trevor Darrell
Recent self-supervised contrastive methods have been able to produce impressive transferable visual representations by learning to be invariant to different data augmentations.
no code implementations • ICCV 2021 • Elad Levi, Tete Xiao, Xiaolong Wang, Trevor Darrell
We theoretically prove and empirically show that under reasonable noise assumptions, margin-based losses tend to project all samples of a class with various modes onto a single point in the embedding space, resulting in a class collapse that usually renders the space ill-sorted for classification or retrieval.
1 code implementation • CVPR 2020 • Joanna Materzynska, Tete Xiao, Roei Herzig, Huijuan Xu, Xiaolong Wang, Trevor Darrell
Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations.
no code implementations • ICCV 2019 • Tete Xiao, Quanfu Fan, Dan Gutfreund, Mathew Monfort, Aude Oliva, Bolei Zhou
The model not only finds when an action is happening and which object is being manipulated, but also identifies which part of the object is being interacted with.
4 code implementations • ECCV 2018 • Borui Jiang, Ruixuan Luo, Jiayuan Mao, Tete Xiao, Yuning Jiang
The network acquires this confidence of localization, which improves the NMS procedure by preserving accurately localized bounding boxes.
Ranked #186 on Object Detection on COCO test-dev
23 code implementations • ECCV 2018 • Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun
In this paper, we study a new task called Unified Perceptual Parsing, which requires the machine vision systems to recognize as many visual concepts as possible from a given image.
Ranked #90 on Semantic Segmentation on ADE20K val
1 code implementation • COLING 2018 • Haoyue Shi, Jiayuan Mao, Tete Xiao, Yuning Jiang, Jian Sun
Begin with an insightful adversarial attack on VSE embeddings, we show the limitation of current frameworks and image-text datasets (e. g., MS-COCO) both quantitatively and qualitatively.
1 code implementation • 30 Apr 2018 • Shuai Shao, Zijian Zhao, Boxun Li, Tete Xiao, Gang Yu, Xiangyu Zhang, Jian Sun
There are a total of $470K$ human instances from the train and validation subsets, and $~22. 6$ persons per image, with various kinds of occlusions in the dataset.
Ranked #7 on Pedestrian Detection on Caltech (using extra training data)
2 code implementations • CVPR 2018 • Xinlong Wang, Tete Xiao, Yuning Jiang, Shuai Shao, Jian Sun, Chunhua Shen
In this paper, we first explore how a state-of-the-art pedestrian detector is harmed by crowd occlusion via experimentation, providing insights into the crowd occlusion problem.
Ranked #9 on Pedestrian Detection on Caltech (using extra training data)
6 code implementations • CVPR 2018 • Chao Peng, Tete Xiao, Zeming Li, Yuning Jiang, Xiangyu Zhang, Kai Jia, Gang Yu, Jian Sun
The improvements in recent CNN-based object detection works, from R-CNN [11], Fast/Faster R-CNN [10, 31] to recent Mask R-CNN [14] and RetinaNet [24], mainly come from new network, new framework, or novel loss design.
no code implementations • CVPR 2017 • Jiayuan Mao, Tete Xiao, Yuning Jiang, Zhimin Cao
Aggregating extra features has been considered as an effective approach to boost traditional pedestrian detection methods.
Ranked #15 on Pedestrian Detection on Caltech
21 code implementations • 18 Aug 2016 • Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, Antonio Torralba
Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision.