Search Results for author: Alan Yuille

Found 255 papers, 136 papers with code

ImageNet3D: Towards General-Purpose Object-Level 3D Understanding

1 code implementation13 Jun 2024 Wufei Ma, Guanning Zeng, Guofeng Zhang, Qihao Liu, Letian Zhang, Adam Kortylewski, Yaoyao Liu, Alan Yuille

A vision model with general-purpose object-level 3D understanding should be capable of inferring both 2D (e. g., class name and bounding box) and 3D information (e. g., 3D location and 3D viewpoint) for arbitrary rigid objects in natural images.

Image Captioning Linear Probing Object-Level 3D Awareness +2

Autoregressive Pretraining with Mamba in Vision

1 code implementation11 Jun 2024 Sucheng Ren, Xianhang Li, Haoqin Tu, Feng Wang, Fangxun Shu, Lei Zhang, Jieru Mei, Linjie Yang, Peng Wang, Heng Wang, Alan Yuille, Cihang Xie

The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks.

Medical Vision Generalist: Unifying Medical Imaging Tasks in Context

1 code implementation8 Jun 2024 Sucheng Ren, Xiaoke Huang, Xianhang Li, Junfei Xiao, Jieru Mei, Zeyu Wang, Alan Yuille, Yuyin Zhou

This study presents Medical Vision Generalist (MVG), the first foundation model capable of handling various medical imaging tasks -- such as cross-modal synthesis, image segmentation, denoising, and inpainting -- within a unified image-to-image generation framework.

Conditional Image Generation Denoising +2

DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data

1 code implementation CVPR 2024 Qihao Liu, Yi Zhang, Song Bai, Adam Kortylewski, Alan Yuille

Unlike recent 3D generative models that rely on clean and well-aligned 3D data, limiting them to single or few-class generation, our model is directly trained on extensive noisy and unaligned `in-the-wild' 3D assets, mitigating the key challenge (i. e., data scarcity) in large-scale 3D generation.

3D Generation Text to 3D

Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering

1 code implementation2 Jun 2024 Xingrui Wang, Wufei Ma, Angtian Wang, Shuo Chen, Adam Kortylewski, Alan Yuille

To demonstrate the importance of an explicit 4D dynamics representation of the scenes in understanding world dynamics, we further propose NS-4Dynamics, a Neural-Symbolic model for reasoning on 4D Dynamics properties under explicit scene representation from videos.

counterfactual Counterfactual Reasoning +3

Quality Sentinel: Estimating Label Quality and Errors in Medical Segmentation Datasets

no code implementations1 Jun 2024 Yixiong Chen, Zongwei Zhou, Alan Yuille

To fill in this bridge, we introduce a regression model, Quality Sentinel, to estimate label quality compared with manual annotations in medical segmentation datasets.

Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography

1 code implementation28 May 2024 Jie Liu, Yixiao Zhang, Kang Wang, Mehmet Can Yavuz, Xiaoxi Chen, Yixuan Yuan, Haoliang Li, Yang Yang, Alan Yuille, Yucheng Tang, Zongwei Zhou

However, these AI models often struggle with flexibility for partially annotated datasets and extensibility for new classes due to limitations in the one-hot encoding, architectural design, and learning scheme.

Computational Efficiency Computed Tomography (CT) +1

HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting

1 code implementation24 May 2024 Yuanhao Cai, Zihao Xiao, Yixun Liang, Minghan Qin, Yulun Zhang, Xiaokang Yang, Yaoyao Liu, Alan Yuille

In this paper, we propose a new framework, High Dynamic Range Gaussian Splatting (HDR-GS), which can efficiently render novel HDR views and reconstruct LDR images with a user input exposure time.

Novel View Synthesis

ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning

no code implementations24 May 2024 Sucheng Ren, Hongru Zhu, Chen Wei, Yijiang Li, Alan Yuille, Cihang Xie

This paper presents a new self-supervised video representation learning framework, ARVideo, which autoregressively predicts the next video token in a tailored sequence order.

Representation Learning

Mamba-R: Vision Mamba ALSO Needs Registers

no code implementations23 May 2024 Feng Wang, Jiahao Wang, Sucheng Ren, Guoyizhe Wei, Jieru Mei, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie

Similar to Vision Transformers, this paper identifies artifacts also present within the feature maps of Vision Mamba.

Semantic Segmentation

NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

3 code implementations22 Apr 2024 Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi Jin, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, Jing Lin, Alan Yuille, Ben Shao, Jin Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin, Yu Zhu, Peng Wu, Wei Dong, Jinqiu Sun, Yanning Zhang, Qingsen Yan, Wenbin Zou, Weipeng Yang, Yunxiang Li, Qiaomu Wei, Tian Ye, Sixiang Chen, Zhao Zhang, Suiyi Zhao, Bo wang, Yan Luo, Zhichao Zuo, Mingshen Wang, Junhu Wang, Yanyan Wei, Xiaopeng Sun, Yu Gao, Jiancheng Huang, Hongming Chen, Xiang Chen, Hui Tang, Yuanbin Chen, Yuanbo Zhou, Xinwei Dai, Xintao Qiu, Wei Deng, Qinquan Gao, Tong Tong, Mingjia Li, Jin Hu, Xinyu He, Xiaojie Guo, sabarinathan, K Uma, A Sasithradevi, B Sathya Bama, S. Mohamed Mansoor Roomi, V. Srivatsav, Jinjuan Wang, Long Sun, Qiuying Chen, Jiahong Shao, Yizhi Zhang, Marcos V. Conde, Daniel Feijoo, Juan C. Benito, Alvaro García, Jaeho Lee, Seongwan Kim, Sharif S M A, Nodirkhuja Khujaev, Roman Tsoy, Ali Murtaza, Uswah Khairuddin, Ahmad 'Athif Mohd Faudzi, Sampada Malagi, Amogh Joshi, Nikhil Akalwadi, Chaitra Desai, Ramesh Ashok Tabib, Uma Mudenagudi, Wenyi Lian, Wenjing Lian, Jagadeesh Kalyanshetti, Vijayalaxmi Ashok Aralikatti, Palani Yashaswini, Nitish Upasi, Dikshit Hegde, Ujwala Patil, Sujata C, Xingzhuo Yan, Wei Hao, Minghan Fu, Pooja Choksy, Anjali Sarvaiya, Kishor Upla, Kiran Raja, Hailong Yan, Yunkai Zhang, Baiang Li, Jingyi Zhang, Huan Zheng

This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results.

4k Low-Light Image Enhancement +1

Learning a Category-level Object Pose Estimator without Pose Annotations

no code implementations8 Apr 2024 Fengrui Tian, Yaoyao Liu, Adam Kortylewski, Yueqi Duan, Shaoyi Du, Alan Yuille, Angtian Wang

Instead of using manually annotated images, we leverage diffusion models (e. g., Zero-1-to-3) to generate a set of images under controlled pose differences and propose to learn our object pose estimator with those images.

Object Pose Estimation

Exploiting Structural Consistency of Chest Anatomy for Unsupervised Anomaly Detection in Radiography Images

1 code implementation13 Mar 2024 Tiange Xiang, Yixiao Zhang, Yongyi Lu, Alan Yuille, Chaoyi Zhang, Weidong Cai, Zongwei Zhou

To this end, we propose a Simple Space-Aware Memory Matrix for In-painting and Detecting anomalies from radiography images (abbreviated as SimSID).

Anatomy Image Reconstruction +1

A Bayesian Approach to OOD Robustness in Image Classification

1 code implementation CVPR 2024 Prakhar Kaushik, Adam Kortylewski, Alan Yuille

This enables us to learn a transitional dictionary of vMF kernels that are intermediate between the source and target domains and train the generative model on this dictionary using the annotations on the source domain, followed by iterative refinement.

 Ranked #1 on Unsupervised Domain Adaptation on OOD-CV (Accuracy (Top-1) metric)

Image Classification Unsupervised Domain Adaptation

From Pixel to Cancer: Cellular Automata in Computed Tomography

1 code implementation11 Mar 2024 Yuxiang Lai, Xiaoxi Chen, Angtian Wang, Alan Yuille, Zongwei Zhou

AI for cancer detection encounters the bottleneck of data scarcity, annotation difficulty, and low prevalence of early tumors.

Computed Tomography (CT)

Leveraging AI Predicted and Expert Revised Annotations in Interactive Segmentation: Continual Tuning or Full Training?

1 code implementation29 Feb 2024 Tiezheng Zhang, Xiaoxi Chen, Chongyu Qu, Alan Yuille, Zongwei Zhou

Human experts revise the annotations predicted by AI, and in turn, AI improves its predictions by learning from these revised annotations.

Interactive Segmentation

Towards Generalizable Tumor Synthesis

1 code implementation CVPR 2024 Qi Chen, Xiaoxi Chen, Haorui Song, Zhiwei Xiong, Alan Yuille, Chen Wei, Zongwei Zhou

Tumor synthesis enables the creation of artificial tumors in medical images, facilitating the training of AI models for tumor detection and segmentation.

Computed Tomography (CT)

PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter

no code implementations16 Feb 2024 Junfei Xiao, Zheng Xu, Alan Yuille, Shen Yan, Boyu Wang

Our research undertakes a thorough exploration of the state-of-the-art perceiver resampler architecture and builds a strong baseline.

Language Modelling Question Answering +1

Source-Free and Image-Only Unsupervised Domain Adaptation for Category Level Object Pose Estimation

no code implementations19 Jan 2024 Prakhar Kaushik, Aayush Mishra, Adam Kortylewski, Alan Yuille

We focus on individual locally robust mesh vertex features and iteratively update them based on their proximity to corresponding features in the target domain even when the global pose is not correct.

Pose Estimation Unsupervised Domain Adaptation

SPFormer: Enhancing Vision Transformer with Superpixel Representation

no code implementations5 Jan 2024 Jieru Mei, Liang-Chieh Chen, Alan Yuille, Cihang Xie

In this work, we introduce SPFormer, a novel Vision Transformer enhanced by superpixel representation.

Superpixels

HISR: Hybrid Implicit Surface Representation for Photorealistic 3D Human Reconstruction

no code implementations28 Dec 2023 Angtian Wang, Yuanlu Xu, Nikolaos Sarafianos, Robert Maier, Edmond Boyer, Alan Yuille, Tony Tung

This representation is composed of two surface layers that represent opaque and translucent regions on the clothed human body.

3D Human Reconstruction

A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties

1 code implementation21 Dec 2023 Junfei Xiao, Ziqi Zhou, Wenxuan Li, Shiyi Lan, Jieru Mei, Zhiding Yu, Alan Yuille, Yuyin Zhou, Cihang Xie

Instead of relying solely on category-specific annotations, ProLab uses descriptive properties grounded in common sense knowledge for supervising segmentation models.

Common Sense Reasoning Descriptive +1

Continual Adversarial Defense

no code implementations15 Dec 2023 Qian Wang, Yaoyao Liu, Hefei Ling, Yingwei Li, Qihao Liu, Ping Li, Jiazhong Chen, Alan Yuille, Ning Yu

In response to the rapidly evolving nature of adversarial attacks against visual classifiers on a monthly basis, numerous defenses have been proposed to generalize against as many known attacks as possible.

Adversarial Defense Continual Learning +2

Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models

no code implementations CVPR 2024 Shitian Zhao, Zhuowan Li, Yadong Lu, Alan Yuille, Yan Wang

We propose Causal Context Generation, Causal-CoG, which is a prompting strategy that engages contextual information to enhance precise VQA during inference.

Question Answering Visual Question Answering

SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference

1 code implementation4 Dec 2023 Feng Wang, Jieru Mei, Alan Yuille

Specifically, we replace the traditional self-attention block of CLIP vision encoder's last layer by our CSA module and reuse its pretrained projection matrices of query, key, and value, leading to a training-free adaptation approach for CLIP's zero-shot semantic segmentation.

Segmentation Semantic Segmentation +2

Rejuvenating image-GPT as Strong Visual Representation Learners

1 code implementation4 Dec 2023 Sucheng Ren, Zeyu Wang, Hongru Zhu, Junfei Xiao, Alan Yuille, Cihang Xie

This paper enhances image-GPT (iGPT), one of the pioneering works that introduce autoregressive pretraining to predict next pixels for visual representation learning.

Representation Learning

Sequential Modeling Enables Scalable Learning for Large Vision Models

1 code implementation CVPR 2024 Yutong Bai, Xinyang Geng, Karttikeya Mangalam, Amir Bar, Alan Yuille, Trevor Darrell, Jitendra Malik, Alexei A Efros

We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data.

Prompt-Based Exemplar Super-Compression and Regeneration for Class-Incremental Learning

1 code implementation30 Nov 2023 Ruxiao Duan, Yaoyao Liu, Jieneng Chen, Adam Kortylewski, Alan Yuille

Replay-based methods in class-incremental learning (CIL) have attained remarkable success, as replaying the exemplars of old classes can significantly mitigate catastrophic forgetting.

Class Incremental Learning Data Augmentation +1

Learning Part Segmentation from Synthetic Animals

no code implementations30 Nov 2023 Jiawei Peng, Ju He, Prakhar Kaushik, Zihao Xiao, Jiteng Mu, Alan Yuille

We then benchmark Syn-to-Real animal part segmentation from SAP to PartImageNet, namely SynRealPart, with existing semantic segmentation domain adaptation methods and further improve them as our second contribution.

Domain Adaptation Pseudo Label +2

A Simple Video Segmenter by Tracking Objects Along Axial Trajectories

2 code implementations30 Nov 2023 Ju He, Qihang Yu, Inkyu Shin, Xueqing Deng, Alan Yuille, Xiaohui Shen, Liang-Chieh Chen

In this work, we present Axial-VS, a general and simple framework that enhances video segmenters by tracking objects along axial trajectories.

Object Object Tracking +5

Instruct2Attack: Language-Guided Semantic Adversarial Attacks

no code implementations27 Nov 2023 Jiang Liu, Chen Wei, Yuxiang Guo, Heng Yu, Alan Yuille, Soheil Feizi, Chun Pong Lau, Rama Chellappa

We propose Instruct2Attack (I2A), a language-guided semantic attack that generates semantically meaningful perturbations according to free-form language instructions.

IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers

no code implementations27 Nov 2023 Chenglin Yang, Siyuan Qiao, Yuan Cao, Yu Zhang, Tao Zhu, Alan Yuille, Jiahui Yu

To tackle this problem, we redesign the scoring objective for the captioner to alleviate the distributional bias and focus on measuring the gain of information brought by the visual inputs.

Caption Generation Language Modelling +2

Structure-Aware Sparse-View X-ray 3D Reconstruction

1 code implementation CVPR 2024 Yuanhao Cai, Jiahao Wang, Alan Yuille, Zongwei Zhou, Angtian Wang

In this paper, we propose a framework, Structure-Aware X-ray Neural Radiodensity Fields (SAX-NeRF), for sparse-view X-ray 3D reconstruction.

3D Reconstruction Low-Dose X-Ray Ct Reconstruction +1

3D-Aware Visual Question Answering about Parts, Poses and Occlusions

2 code implementations NeurIPS 2023 Xingrui Wang, Wufei Ma, Zhuowan Li, Adam Kortylewski, Alan Yuille

In this work, we introduce the task of 3D-aware VQA, which focuses on challenging questions that require a compositional reasoning over the 3D structure of visual scenes.

Question Answering Visual Question Answering

Synthetic Data as Validation

no code implementations24 Oct 2023 Qixin Hu, Alan Yuille, Zongwei Zhou

Specifically, the DSC score for liver tumor segmentation improves from 26. 7% (95% CI: 22. 6%-30. 9%) to 34. 5% (30. 8%-38. 2%) when evaluated on an in-domain dataset and from 31. 1% (26. 0%-36. 2%) to 35. 4% (32. 1%-38. 7%) on an out-domain dataset.

Computed Tomography (CT) Continual Learning +1

Acquiring Weak Annotations for Tumor Localization in Temporal and Volumetric Data

1 code implementation23 Oct 2023 Yu-Cheng Chou, Bowen Li, Deng-Ping Fan, Alan Yuille, Zongwei Zhou

In summary, this research proposes an efficient annotation strategy for tumor detection and localization that is less accurate than per-pixel annotations but useful for creating large-scale datasets for screening tumors in various medical modalities.

Weakly-supervised Learning

3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers

3 code implementations11 Oct 2023 Jieneng Chen, Jieru Mei, Xianhang Li, Yongyi Lu, Qihang Yu, Qingyue Wei, Xiangde Luo, Yutong Xie, Ehsan Adeli, Yan Wang, Matthew Lungren, Lei Xing, Le Lu, Alan Yuille, Yuyin Zhou

In this paper, we extend the 2D TransUNet architecture to a 3D network by building upon the state-of-the-art nnU-Net architecture, and fully exploring Transformers' potential in both the encoder and decoder design.

Decoder Image Segmentation +4

FedConv: Enhancing Convolutional Neural Networks for Handling Data Heterogeneity in Federated Learning

1 code implementation6 Oct 2023 Peiran Xu, Zeyu Wang, Jieru Mei, Liangqiong Qu, Alan Yuille, Cihang Xie, Yuyin Zhou

Federated learning (FL) is an emerging paradigm in machine learning, where a shared model is collaboratively learned using data from multiple devices to mitigate the risk of data leakage.

Federated Learning

Understanding Pan-Sharpening via Generalized Inverse

no code implementations4 Oct 2023 Shiqi Liu, Yutong Bai, Xinyang Han, Alan Yuille

By the generalized inverse theory, we derived two forms of general inverse matrix formulations that can correspond to the two prominent classes of Pan-sharpening methods, that is, component substitution and multi-resolution analysis methods.

Boosting Dermatoscopic Lesion Segmentation via Diffusion Models with Visual and Textual Prompts

no code implementations4 Oct 2023 Shiyi Du, Xiaosong Wang, Yongyi Lu, Yuyin Zhou, Shaoting Zhang, Alan Yuille, Kang Li, Zongwei Zhou

Image synthesis approaches, e. g., generative adversarial networks, have been popular as a form of data augmentation in medical image analysis tasks.

Data Augmentation Image Generation +2

3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation

1 code implementation ICCV 2023 Yi Zhang, Pengliang Ji, Angtian Wang, Jieru Mei, Adam Kortylewski, Alan Yuille

Motivated by the recent success of generative models in rigid object pose estimation, we propose 3D-aware Neural Body Fitting (3DNBF) - an approximate analysis-by-synthesis approach to 3D human pose estimation with SOTA performance and occlusion robustness.

3D Human Pose Estimation Contrastive Learning

Early Detection and Localization of Pancreatic Cancer by Label-Free Tumor Synthesis

1 code implementation6 Aug 2023 Bowen Li, Yu-Cheng Chou, Shuwen Sun, Hualin Qiao, Alan Yuille, Zongwei Zhou

We further investigate the per-voxel segmentation performance of pancreatic tumors if AI is trained on a combination of CT scans with synthetic tumors and CT scans with annotated large tumors at an advanced stage.

Specificity

SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation

1 code implementation24 Jul 2023 YiQing Wang, Zihan Li, Jieru Mei, Zihao Wei, Li Liu, Chen Wang, Shengtian Sang, Alan Yuille, Cihang Xie, Yuyin Zhou

To address this limitation, we present Masked Multi-view with Swin Transformers (SwinMM), a novel multi-view pipeline for enabling accurate and data-efficient self-supervised medical image analysis.

Contrastive Learning Image Reconstruction +4

Generating Images with 3D Annotations Using Diffusion Models

no code implementations13 Jun 2023 Wufei Ma, Qihao Liu, Jiahao Wang, Angtian Wang, Xiaoding Yuan, Yi Zhang, Zihao Xiao, Guofeng Zhang, Beijia Lu, Ruxiao Duan, Yongrui Qi, Adam Kortylewski, Yaoyao Liu, Alan Yuille

With explicit 3D geometry control, we can easily change the 3D structures of the objects in the generated images and obtain ground-truth 3D annotations automatically.

3D Pose Estimation Style Transfer

Compositor: Bottom-up Clustering and Compositing for Robust Part and Object Segmentation

1 code implementation CVPR 2023 Ju He, Jieneng Chen, Ming-Xian Lin, Qihang Yu, Alan Yuille

Compositor achieves state-of-the-art performance on PartImageNet and Pascal-Part by outperforming previous methods by around 0. 9% and 1. 3% on PartImageNet, 0. 4% and 1. 7% on Pascal-Part in terms of part and object mIoU and demonstrates better robustness against occlusion by around 4. 4% and 7. 1% on part and object respectively.

Clustering Object +2

Continual Learning for Abdominal Multi-Organ and Tumor Segmentation

1 code implementation1 Jun 2023 Yixiao Zhang, Xinyi Li, Huimiao Chen, Alan Yuille, Yaoyao Liu, Zongwei Zhou

The ability to dynamically extend a model to new data and classes is critical for multiple organ and tumor segmentation.

Continual Learning Organ Segmentation +2

Discovering Failure Modes of Text-guided Diffusion Models via Adversarial Search

no code implementations1 Jun 2023 Qihao Liu, Adam Kortylewski, Yutong Bai, Song Bai, Alan Yuille

(2) We find regions in the latent space that lead to distorted images independent of the text prompt, suggesting that parts of the latent space are not well-structured.

Adversarial Attack Efficient Exploration +1

Neural Textured Deformable Meshes for Robust Analysis-by-Synthesis

no code implementations31 May 2023 Angtian Wang, Wufei Ma, Alan Yuille, Adam Kortylewski

Human vision demonstrates higher robustness than current AI algorithms under out-of-distribution scenarios.

Robust Category-Level 3D Pose Estimation from Synthetic Data

no code implementations25 May 2023 Jiahao Yang, Wufei Ma, Angtian Wang, Xiaoding Yuan, Alan Yuille, Adam Kortylewski

In this work, we aim to narrow the performance gap between models trained on synthetic data and few real images and fully supervised models trained on large-scale data.

3D Pose Estimation 3D Reconstruction +4

Robust 3D-aware Object Classification via Discriminative Render-and-Compare

no code implementations24 May 2023 Artur Jesslen, Guofeng Zhang, Angtian Wang, Alan Yuille, Adam Kortylewski

Using differentiable rendering, we estimate the 3D object pose by minimizing the reconstruction error between the mesh and the feature representation of the target image.

Classification Image Classification +2

Label-Free Liver Tumor Segmentation

1 code implementation CVPR 2023 Qixin Hu, Yixiong Chen, Junfei Xiao, Shuwen Sun, Jieneng Chen, Alan Yuille, Zongwei Zhou

We demonstrate that AI models can accurately segment liver tumors without the need for manual annotation by using synthetic tumors in CT scans.

Segmentation Tumor Segmentation

InstMove: Instance Motion for Object-centric Video Segmentation

1 code implementation CVPR 2023 Qihao Liu, Junfeng Wu, Yi Jiang, Xiang Bai, Alan Yuille, Song Bai

A common solution is to use optical flow to provide motion information, but essentially it only considers pixel-level motion, which still relies on appearance similarity and hence is often inaccurate under occlusion and fast movement.

Object Optical Flow Estimation +3

PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation

1 code implementation CVPR 2023 Qihao Liu, Adam Kortylewski, Alan Yuille

We introduce a learning-based testing method, termed PoseExaminer, that automatically diagnoses HPS algorithms by searching over the parameter space of human pose images to find the failure modes.

Multi-agent Reinforcement Learning

Benchmarking Robustness in Neural Radiance Fields

no code implementations10 Jan 2023 Chen Wang, Angtian Wang, Junbo Li, Alan Yuille, Cihang Xie

We find that NeRF-based models are significantly degraded in the presence of corruption, and are more sensitive to a different set of corruptions than image recognition models.

Benchmarking Camera Calibration +2

CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection

2 code implementations ICCV 2023 Jie Liu, Yixiao Zhang, Jie-Neng Chen, Junfei Xiao, Yongyi Lu, Bennett A. Landman, Yixuan Yuan, Alan Yuille, Yucheng Tang, Zongwei Zhou

The proposed model is developed from an assembly of 14 datasets, using a total of 3, 410 CT scans for training and then evaluated on 6, 162 external CT scans from 3 additional datasets.

Organ Segmentation Segmentation +1

Learning Road Scene-level Representations via Semantic Region Prediction

no code implementations2 Jan 2023 Zihao Xiao, Alan Yuille, Yi-Ting Chen

In this work, we tackle two vital tasks in automated driving systems, i. e., driver intent prediction and risk object identification from egocentric images.

Unleashing the Power of Visual Prompting At the Pixel Level

1 code implementation20 Dec 2022 Junyang Wu, Xianhang Li, Chen Wei, Huiyu Wang, Alan Yuille, Yuyin Zhou, Cihang Xie

This paper presents a simple and effective visual prompting method for adapting pre-trained models to downstream recognition tasks.

Visual Prompting

AsyInst: Asymmetric Affinity with DepthGrad and Color for Box-Supervised Instance Segmentation

no code implementations7 Dec 2022 Siwei Yang, Longlong Jing, Junfei Xiao, Hang Zhao, Alan Yuille, Yingwei Li

Through systematic analysis, we found that the commonly used pairwise affinity loss has two limitations: (1) it works with color affinity but leads to inferior performance with other modalities such as depth gradient, (2)the original affinity loss does not prevent trivial predictions as intended but actually accelerates this process due to the affinity loss term being symmetric.

Box-supervised Instance Segmentation Segmentation +2

Localization vs. Semantics: Visual Representations in Unimodal and Multimodal Models

no code implementations1 Dec 2022 Zhuowan Li, Cihang Xie, Benjamin Van Durme, Alan Yuille

Despite the impressive advancements achieved through vision-and-language pretraining, it remains unclear whether this joint learning paradigm can help understand each individual modality.

Attribute Representation Learning

LUMix: Improving Mixup by Better Modelling Label Uncertainty

no code implementations29 Nov 2022 Shuyang Sun, Jie-Neng Chen, Ruifei He, Alan Yuille, Philip Torr, Song Bai

LUMix is simple as it can be implemented in just a few lines of code and can be universally applied to any deep networks \eg CNNs and Vision Transformers, with minimal computational cost.

Data Augmentation

SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training

no code implementations ICCV 2023 Yuanze Lin, Chen Wei, Huiyu Wang, Alan Yuille, Cihang Xie

Coupling all these designs allows our method to enjoy both competitive performances on text-to-video retrieval and video question answering tasks, and much less pre-training costs by 1. 9X or more.

Question Answering Retrieval +3

1st Place Solution of The Robust Vision Challenge 2022 Semantic Segmentation Track

1 code implementation23 Oct 2022 Junfei Xiao, Zhichao Xu, Shiyi Lan, Zhiding Yu, Alan Yuille, Anima Anandkumar

The model is trained on a composite dataset consisting of images from 9 datasets (ADE20K, Cityscapes, Mapillary Vistas, ScanNet, VIPER, WildDash 2, IDD, BDD, and COCO) with a simple dataset balancing strategy.

Segmentation Semantic Segmentation

Delving into Masked Autoencoders for Multi-Label Thorax Disease Classification

1 code implementation23 Oct 2022 Junfei Xiao, Yutong Bai, Alan Yuille, Zongwei Zhou

We hope that this study can direct future research on the application of Transformers to a larger variety of medical imaging tasks.

Computational Efficiency Transfer Learning

Context-Enhanced Stereo Transformer

1 code implementation21 Oct 2022 Weiyu Guo, Zhaoshuo Li, Yongkui Yang, Zheng Wang, Russell H. Taylor, Mathias Unberath, Alan Yuille, Yingwei Li

We construct our stereo depth estimation model, Context Enhanced Stereo Transformer (CSTR), by plugging CEP into the state-of-the-art stereo depth estimation method Stereo Transformer.

Stereo Depth Estimation Stereo Matching

Masked Autoencoders Enable Efficient Knowledge Distillers

1 code implementation CVPR 2023 Yutong Bai, Zeyu Wang, Junfei Xiao, Chen Wei, Huiyu Wang, Alan Yuille, Yuyin Zhou, Cihang Xie

For example, by distilling the knowledge from an MAE pre-trained ViT-L into a ViT-B, our method achieves 84. 0% ImageNet top-1 accuracy, outperforming the baseline of directly distilling a fine-tuned ViT-L by 1. 2%.

Knowledge Distillation

Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation

no code implementations29 Jul 2022 Qihao Liu, Yi Zhang, Song Bai, Alan Yuille

Inspired by the remarkable ability of humans to infer occluded joints from visible cues, we develop a method to explicitly model this process that significantly improves bottom-up multi-person human pose estimation with or without occlusions.

3D Human Pose Estimation 3D Multi-Person Pose Estimation (absolute) +2

In Defense of Online Models for Video Instance Segmentation

1 code implementation21 Jul 2022 Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Ranked #9 on Video Instance Segmentation on YouTube-VIS validation (using extra training data)

Contrastive Learning Instance Segmentation +5

kMaX-DeepLab: k-means Mask Transformer

2 code implementations8 Jul 2022 Qihang Yu, Huiyu Wang, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

However, we observe that most existing transformer-based vision models simply borrow the idea from NLP, neglecting the crucial difference between languages and images, particularly the extremely large sequence length of spatially flattened pixel features.

Clustering Object Detection +1

Unsupervised Domain Adaptation through Shape Modeling for Medical Image Segmentation

1 code implementation6 Jul 2022 Yuan YAO, Fengze Liu, Zongwei Zhou, Yan Wang, Wei Shen, Alan Yuille, Yongyi Lu

Previous methods proposed Variational Autoencoder (VAE) based models to learn the distribution of shape for a particular organ and used it to automatically evaluate the quality of a segmentation prediction by fitting it into the learned shape distribution.

Image Segmentation Pancreas Segmentation +3

A Simple Data Mixing Prior for Improving Self-Supervised Learning

1 code implementation CVPR 2022 Sucheng Ren, Huiyu Wang, Zhengqi Gao, Shengfeng He, Alan Yuille, Yuyin Zhou, Cihang Xie

More notably, our SDMP is the first method that successfully leverages data mixing to improve (rather than hurt) the performance of Vision Transformers in the self-supervised setting.

Representation Learning Self-Supervised Learning

VoGE: A Differentiable Volume Renderer using Gaussian Ellipsoids for Analysis-by-Synthesis

1 code implementation30 May 2022 Angtian Wang, Peng Wang, Jian Sun, Adam Kortylewski, Alan Yuille

The Gaussian reconstruction kernels have been proposed by Westover (1990) and studied by the computer graphics community back in the 90s, which gives an alternative representation of object 3D geometry from meshes and point clouds.

Pose Estimation

In Defense of Image Pre-Training for Spatiotemporal Recognition

1 code implementation3 May 2022 Xianhang Li, Huiyu Wang, Chen Wei, Jieru Mei, Alan Yuille, Yuyin Zhou, Cihang Xie

Inspired by this observation, we hypothesize that the key to effectively leveraging image pre-training lies in the decomposition of learning spatial and temporal features, and revisiting image pre-training as the appearance prior to initializing 3D kernels.

STS Video Recognition

Fast AdvProp

1 code implementation ICLR 2022 Jieru Mei, Yucheng Han, Yutong Bai, Yixiao Zhang, Yingwei Li, Xianhang Li, Alan Yuille, Cihang Xie

Specifically, our modifications in Fast AdvProp are guided by the hypothesis that disentangled learning with adversarial examples is the key for performance improvements, while other training recipes (e. g., paired clean and adversarial training samples, multi-step adversarial attackers) could be largely simplified.

Data Augmentation object-detection +1

CP2: Copy-Paste Contrastive Pretraining for Semantic Segmentation

1 code implementation22 Mar 2022 Feng Wang, Huiyu Wang, Chen Wei, Alan Yuille, Wei Shen

Recent advances in self-supervised contrastive learning yield good image-level representation, which favors classification tasks but usually neglects pixel-level detailed information, leading to unsatisfactory transfer performance to dense prediction tasks such as semantic segmentation.

Contrastive Learning Representation Learning +2

DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection

1 code implementation CVPR 2022 Yingwei Li, Adams Wei Yu, Tianjian Meng, Ben Caine, Jiquan Ngiam, Daiyi Peng, Junyang Shen, Bo Wu, Yifeng Lu, Denny Zhou, Quoc V. Le, Alan Yuille, Mingxing Tan

In this paper, we propose two novel techniques: InverseAug that inverses geometric-related augmentations, e. g., rotation, to enable accurate geometric alignment between lidar points and image pixels, and LearnableAlign that leverages cross-attention to dynamically capture the correlations between image and lidar features during fusion.

3D Object Detection Autonomous Driving +2

Lite Vision Transformer with Enhanced Self-Attention

1 code implementation CVPR 2022 Chenglin Yang, Yilin Wang, Jianming Zhang, He Zhang, Zijun Wei, Zhe Lin, Alan Yuille

We propose Lite Vision Transformer (LVT), a novel light-weight transformer network with two enhanced self-attention mechanisms to improve the model performances for mobile deployment.

Panoptic Segmentation Segmentation

MT-TransUNet: Mediating Multi-Task Tokens in Transformers for Skin Lesion Segmentation and Classification

1 code implementation3 Dec 2021 Jingye Chen, Jieneng Chen, Zongwei Zhou, Bin Li, Alan Yuille, Yongyi Lu

However, these approaches formulated skin cancer diagnosis as a simple classification task, dismissing the potential benefit from lesion segmentation.

Classification Computational Efficiency +4

PartImageNet: A Large, High-Quality Dataset of Parts

1 code implementation2 Dec 2021 Ju He, Shuo Yang, Shaokang Yang, Adam Kortylewski, Xiaoding Yuan, Jie-Neng Chen, Shuai Liu, Cheng Yang, Qihang Yu, Alan Yuille

To help address this problem, we propose PartImageNet, a large, high-quality dataset with part segmentation annotations.

Activity Recognition Few-Shot Learning +6

OOD-CV: A Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images

no code implementations29 Nov 2021 Bingchen Zhao, Shaozuo Yu, Wufei Ma, Mingxin Yu, Shenxiao Mei, Angtian Wang, Ju He, Alan Yuille, Adam Kortylewski

One reason is that existing robustness benchmarks are limited, as they either rely on synthetic data or ignore the effects of individual nuisance factors.

3D Pose Estimation Benchmarking +5

Learning from Temporal Gradient for Semi-supervised Action Recognition

1 code implementation CVPR 2022 Junfei Xiao, Longlong Jing, Lin Zhang, Ju He, Qi She, Zongwei Zhou, Alan Yuille, Yingwei Li

Our method achieves the state-of-the-art performance on three video action recognition benchmarks (i. e., Kinetics-400, UCF-101, and HMDB-51) under several typical semi-supervised settings (i. e., different ratios of labeled data).

Action Recognition Temporal Action Localization

TransMix: Attend to Mix for Vision Transformers

2 code implementations CVPR 2022 Jie-Neng Chen, Shuyang Sun, Ju He, Philip Torr, Alan Yuille, Song Bai

The confidence of the label will be larger if the corresponding input image is weighted higher by the attention map.

Instance Segmentation object-detection +3

Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge

no code implementations15 Nov 2021 Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

To promote the development of occlusion understanding, we collect a large-scale dataset called OVIS for video instance segmentation in the occluded scenario.

Instance Segmentation Object Recognition +3

Searching for TrioNet: Combining Convolution with Local and Global Self-Attention

no code implementations15 Nov 2021 Huaijin Pi, Huiyu Wang, Yingwei Li, Zizhang Li, Alan Yuille

In order to effectively search in this huge architecture space, we propose Hierarchical Sampling for better training of the supernet.

Neural Architecture Search

Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose

1 code implementation NeurIPS 2021 Angtian Wang, Shenxiao Mei, Alan Yuille, Adam Kortylewski

The model is initialized from a few labelled images and is subsequently used to synthesize feature representations of unseen 3D views.

3D Pose Estimation Few-Shot Learning

Image BERT Pre-training with Online Tokenizer

no code implementations ICLR 2022 Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, Tao Kong

The success of language Transformers is primarily attributed to the pretext task of masked language modeling (MLM), where texts are first tokenized into semantically meaningful pieces.

Image Classification Instance Segmentation +5

RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

1 code implementation11 Sep 2021 Shiyu Tang, Ruihao Gong, Yan Wang, Aishan Liu, Jiakai Wang, Xinyun Chen, Fengwei Yu, Xianglong Liu, Dawn Song, Alan Yuille, Philip H. S. Torr, DaCheng Tao

Thus, we propose RobustART, the first comprehensive Robustness investigation benchmark on ImageNet regarding ARchitecture design (49 human-designed off-the-shelf architectures and 1200+ networks from neural architecture search) and Training techniques (10+ techniques, e. g., data augmentation) towards diverse noises (adversarial, natural, and system noises).

Adversarial Robustness Benchmarking +2

Progressive Stage-wise Learning for Unsupervised Feature Representation Enhancement

no code implementations CVPR 2021 Zefan Li, Chenxi Liu, Alan Yuille, Bingbing Ni, Wenjun Zhang, Wen Gao

For a given unsupervised task, we design multilevel tasks and define different learning stages for the deep network.

Simulated Adversarial Testing of Face Recognition Models

no code implementations CVPR 2022 Nataniel Ruiz, Adam Kortylewski, Weichao Qiu, Cihang Xie, Sarah Adel Bargal, Alan Yuille, Stan Sclaroff

In this work, we propose a framework for learning how to test machine learning algorithms using simulators in an adversarial manner in order to find weaknesses in the model before deploying it in critical scenarios.

BIG-bench Machine Learning Face Recognition

Glance-and-Gaze Vision Transformer

1 code implementation NeurIPS 2021 Qihang Yu, Yingda Xia, Yutong Bai, Yongyi Lu, Alan Yuille, Wei Shen

It is motivated by the Glance and Gaze behavior of human beings when recognizing objects in natural scenes, with the ability to efficiently model both long-range dependencies and local context.

Rethinking Re-Sampling in Imbalanced Semi-Supervised Learning

1 code implementation1 Jun 2021 Ju He, Adam Kortylewski, Shaokang Yang, Shuai Liu, Cheng Yang, Changhu Wang, Alan Yuille

In particular, we decouple the training of the representation and the classifier, and systematically investigate the effects of different data re-sampling techniques when training the whole network including a classifier as well as fine-tuning the feature extractor only.

Visual analogy: Deep learning versus compositional models

no code implementations14 May 2021 Nicholas Ichien, Qing Liu, Shuhao Fu, Keith J. Holyoak, Alan Yuille, Hongjing Lu

We compared human performance to that of two recent deep learning models (Siamese Network and Relation Network) directly trained to solve these analogy problems, as well as to that of a compositional model that assesses relational similarity between part-based representations.

Relation Network Visual Analogies

Auto-FedAvg: Learnable Federated Averaging for Multi-Institutional Medical Image Segmentation

no code implementations20 Apr 2021 Yingda Xia, Dong Yang, Wenqi Li, Andriy Myronenko, Daguang Xu, Hirofumi Obinata, Hitoshi Mori, Peng An, Stephanie Harmon, Evrim Turkbey, Baris Turkbey, Bradford Wood, Francesca Patella, Elvira Stellato, Gianpaolo Carrafiello, Anna Ierardi, Alan Yuille, Holger Roth

In this work, we design a new data-driven approach, namely Auto-FedAvg, where aggregation weights are dynamically adjusted, depending on data distributions across data silos and the current training progress of the models.

Federated Learning Image Segmentation +3

Self-Supervised Pillar Motion Learning for Autonomous Driving

1 code implementation CVPR 2021 Chenxu Luo, Xiaodong Yang, Alan Yuille

Autonomous driving can benefit from motion behavior comprehension when interacting with diverse traffic participants in highly dynamic environments.

Autonomous Driving Motion Estimation

A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation

1 code implementation ICCV 2021 Jiteng Mu, Weichao Qiu, Adam Kortylewski, Alan Yuille, Nuno Vasconcelos, Xiaolong Wang

To deal with the large shape variance, we introduce Articulated Signed Distance Functions (A-SDF) to represent articulated shapes with a disentangled latent space, where we have separate codes for encoding shape and articulation.

Test-time Adaptation

CateNorm: Categorical Normalization for Robust Medical Image Segmentation

1 code implementation29 Mar 2021 Junfei Xiao, Lequan Yu, Zongwei Zhou, Yutong Bai, Lei Xing, Alan Yuille, Yuyin Zhou

We propose a new normalization strategy, named categorical normalization (CateNorm), to normalize the activations according to categorical statistics.

Image Segmentation Medical Image Segmentation +2

Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency

no code implementations CVPR 2021 Qing Liu, Vignesh Ramanathan, Dhruv Mahajan, Alan Yuille, Zhenheng Yang

However, existing approaches which rely only on image-level class labels predominantly suffer from errors due to (a) partial segmentation of objects and (b) missing object predictions.

Instance Segmentation Relation Network +3

Understanding Catastrophic Forgetting and Remembering in Continual Learning with Optimal Relevance Mapping

1 code implementation22 Feb 2021 Prakhar Kaushik, Alex Gain, Adam Kortylewski, Alan Yuille

Additionally, current approaches that deal with forgetting ignore the problem of catastrophic remembering, i. e. the worsening ability to discriminate between data from different tasks.

Continual Learning

Occluded Video Instance Segmentation: A Benchmark

2 code implementations2 Feb 2021 Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 16. 3, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario.

Instance Segmentation Segmentation +3

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation

1 code implementation ICLR 2021 Angtian Wang, Adam Kortylewski, Alan Yuille

Using differentiable rendering we estimate the 3D object pose by minimizing the reconstruction error between NeMo and the feature representation of the target image.

3D Pose Estimation Contrastive Learning

CORL: Compositional Representation Learning for Few-Shot Classification

no code implementations28 Jan 2021 Ju He, Adam Kortylewski, Alan Yuille

In particular, during meta-learning, we train a knowledge base that consists of a dictionary of component representations and a dictionary of component activation maps that encode common spatial activation patterns of components.

Classification Few-Shot Image Classification +3

Meticulous Object Segmentation

1 code implementation13 Dec 2020 Chenglin Yang, Yilin Wang, Jianming Zhang, He Zhang, Zhe Lin, Alan Yuille

To evaluate segmentation quality near object boundaries, we propose the Meticulosity Quality (MQ) score considering both the mask coverage and boundary precision.

2k 4k +5

Mask Guided Matting via Progressive Refinement Network

1 code implementation CVPR 2021 Qihang Yu, Jianming Zhang, He Zhang, Yilin Wang, Zhe Lin, Ning Xu, Yutong Bai, Alan Yuille

We propose Mask Guided (MG) Matting, a robust matting framework that takes a general coarse mask as guidance.

Image Matting

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

1 code implementation CVPR 2021 Siyuan Qiao, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

We name this joint task as Depth-aware Video Panoptic Segmentation, and propose a new evaluation metric along with two derived datasets for it, which will be made available to the public.

 Ranked #1 on Video Panoptic Segmentation on Cityscapes-VPS (using extra training data)

Depth-aware Video Panoptic Segmentation Monocular Depth Estimation +2

Robustness Out of the Box: Compositional Representations Naturally Defend Against Black-Box Patch Attacks

no code implementations1 Dec 2020 Christian Cosgrove, Adam Kortylewski, Chenglin Yang, Alan Yuille

Second, we find that compositional deep networks, which have part-based representations that lead to innate robustness to natural occlusion, are robust to patch attacks on PASCAL3D+ and the German Traffic Sign Recognition Benchmark, without adversarial training.

Traffic Sign Recognition

Unsupervised Part Discovery via Feature Alignment

no code implementations1 Dec 2020 Mengqi Guo, Yutong Bai, Zhishuai Zhang, Adam Kortylewski, Alan Yuille

Specifically, given a training image, we find a set of similar images that show instances of the same object category in the same pose, through an affine alignment of their corresponding feature maps.

Object Object Recognition

MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

3 code implementations CVPR 2021 Huiyu Wang, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

As a result, MaX-DeepLab shows a significant 7. 1% PQ gain in the box-free regime on the challenging COCO dataset, closing the gap between box-based and box-free methods for the first time.

Panoptic Segmentation

Batch Normalization with Enhanced Linear Transformation

1 code implementation28 Nov 2020 Yuhui Xu, Lingxi Xie, Cihang Xie, Jieru Mei, Siyuan Qiao, Wei Shen, Hongkai Xiong, Alan Yuille

Batch normalization (BN) is a fundamental unit in modern deep networks, in which a linear transformation module was designed for improving BN's flexibility of fitting complex data distributions.

Amodal Segmentation through Out-of-Task and Out-of-Distribution Generalization with a Bayesian Model

1 code implementation CVPR 2022 Yihong Sun, Adam Kortylewski, Alan Yuille

Moreover, by leveraging an outlier process, Bayesian models can further generalize out-of-distribution to segment partially occluded objects and to predict their amodal object boundaries.

Amodal Instance Segmentation Object +2

Shape-Texture Debiased Neural Network Training

1 code implementation ICLR 2021 Yingwei Li, Qihang Yu, Mingxing Tan, Jieru Mei, Peng Tang, Wei Shen, Alan Yuille, Cihang Xie

To prevent models from exclusively attending on a single cue in representation learning, we augment training data with images with conflicting shape and texture information (eg, an image of chimpanzee shape but with lemon texture) and, most importantly, provide the corresponding supervisions from shape and texture simultaneously.

Adversarial Robustness Data Augmentation +2

CO2: Consistent Contrast for Unsupervised Visual Representation Learning

no code implementations ICLR 2021 Chen Wei, Huiyu Wang, Wei Shen, Alan Yuille

Regarding the similarity of the query crop to each crop from other images as "unlabeled", the consistency term takes the corresponding similarity of a positive crop as a pseudo label, and encourages consistency between these two similarities.

Contrastive Learning Image Classification +5

Lymph Node Gross Tumor Volume Detection and Segmentation via Distance-based Gating using 3D CT/PET Imaging in Radiotherapy

no code implementations27 Aug 2020 Zhuotun Zhu, Dakai Jin, Ke Yan, Tsung-Ying Ho, Xianghua Ye, Dazhou Guo, Chun-Hung Chao, Jing Xiao, Alan Yuille, Le Lu

Finding, identifying and segmenting suspicious cancer metastasized lymph nodes from 3D multi-modality imaging is a clinical task of paramount importance.

ASAP-Net: Attention and Structure Aware Point Cloud Sequence Segmentation

1 code implementation12 Aug 2020 Hanwen Cao, Yongyi Lu, Cewu Lu, Bo Pang, Gongshen Liu, Alan Yuille

In this paper, we further improve spatio-temporal point cloud feature learning with a flexible module called ASAP considering both attention and structure information across frames, which we find as two important factors for successful segmentation in dynamic point clouds.

Segmentation

Probabilistic Multi-modal Trajectory Prediction with Lane Attention for Autonomous Vehicles

no code implementations6 Jul 2020 Chenxu Luo, Lin Sun, Dariush Dabiri, Alan Yuille

As for vehicles, their trajectories are significantly influenced by the lane geometry and how to effectively use the lane information is of active interest.

Autonomous Vehicles Decoder +2

Uncertainty-aware multi-view co-training for semi-supervised medical image segmentation and domain adaptation

no code implementations28 Jun 2020 Yingda Xia, Dong Yang, Zhiding Yu, Fengze Liu, Jinzheng Cai, Lequan Yu, Zhuotun Zhu, Daguang Xu, Alan Yuille, Holger Roth

Experiments on the NIH pancreas segmentation dataset and a multi-organ segmentation dataset show state-of-the-art performance of the proposed framework on semi-supervised medical image segmentation.

Image Segmentation Organ Segmentation +6

Compositional Convolutional Neural Networks: A Robust and Interpretable Model for Object Recognition under Occlusion

no code implementations28 Jun 2020 Adam Kortylewski, Qing Liu, Angtian Wang, Yihong Sun, Alan Yuille

The structure of the compositional model enables CompositionalNets to decompose images into objects and context, as well as to further decompose object representations in terms of individual parts and the objects' pose.

Image Classification