Search Results for author: Yu-Xiong Wang

Found 68 papers, 36 papers with code

TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding

no code implementations28 Feb 2024 Zhihao Zhang, Shengcao Cao, Yu-Xiong Wang

The limited scale of current 3D shape datasets hinders the advancements in 3D shape understanding, and motivates multi-modal learning approaches which transfer learned knowledge from data-abundant 2D image and language modalities to 3D shapes.

3D Shape Representation Representation Learning +1

HASSOD: Hierarchical Adaptive Self-Supervised Object Detection

1 code implementation NeurIPS 2023 Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, Yu-Xiong Wang

The human visual perception system demonstrates exceptional capabilities in learning without explicit supervision and understanding the part-to-whole composition of objects.

Object object-detection +2

ViCA-NeRF: View-Consistency-Aware 3D Editing of Neural Radiance Fields

1 code implementation NeurIPS 2023 Jiahua Dong, Yu-Xiong Wang

In addition to the implicit neural radiance field (NeRF) modeling, our key insight is to exploit two sources of regularization that explicitly propagate the editing information across different views, thus ensuring multi-view consistency.

Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion Models

no code implementations10 Dec 2023 Zhipeng Bao, Yijun Li, Krishna Kumar Singh, Yu-Xiong Wang, Martial Hebert

Despite recent significant strides achieved by diffusion-based Text-to-Image (T2I) models, current systems are still less capable of ensuring decent compositional generation aligned with text prompts, particularly for the multi-object generation.

Test-time Adaptation

Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching

1 code implementation2 Nov 2023 Kai Yan, Alexander G. Schwing, Yu-Xiong Wang

To address this problem, we propose Primal Wasserstein DICE (PW-DICE), which minimizes the primal Wasserstein distance between the expert and learner state occupancies with a pessimistic regularizer and leverages a contrastively learned distance as the underlying metric for the Wasserstein distance.

Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models

1 code implementation NeurIPS 2023 Andy Zhou, Jindong Wang, Yu-Xiong Wang, Haohan Wang

We propose a conceptually simple and lightweight framework for improving the robustness of vision models through the combination of knowledge distillation and data augmentation.

Data Augmentation Domain Generalization +2

Frozen Transformers in Language Models Are Effective Visual Encoder Layers

1 code implementation19 Oct 2023 Ziqi Pang, Ziyang Xie, Yunze Man, Yu-Xiong Wang

This paper reveals that large language models (LLMs), despite being trained solely on textual data, are surprisingly strong encoders for purely visual tasks in the absence of language.

Action Recognition Motion Forecasting +4

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

1 code implementation6 Oct 2023 Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang

While large language models (LLMs) have demonstrated impressive performance on a range of decision-making tasks, they rely on simple acting processes and fall short of broad deployment as autonomous agents.

Code Generation Decision Making +1

Streaming Motion Forecasting for Autonomous Driving

1 code implementation2 Oct 2023 Ziqi Pang, Deva Ramanan, Mengtian Li, Yu-Xiong Wang

Our benchmark inherently captures the disappearance and re-appearance of agents, presenting the emergent challenge of forecasting for occluded agents, which is a safety-critical problem yet overlooked by snapshot-based benchmarks.

Autonomous Navigation Motion Forecasting +1

Multi-task View Synthesis with Neural Radiance Fields

1 code implementation ICCV 2023 Shuhong Zheng, Zhipeng Bao, Martial Hebert, Yu-Xiong Wang

To tackle the MTVS problem, we propose MuvieNeRF, a framework that incorporates both multi-task and cross-view knowledge to simultaneously synthesize multiple scene properties.

Novel View Synthesis

Improving Equivariance in State-of-the-Art Supervised Depth and Normal Predictors

1 code implementation ICCV 2023 Yuanyi Zhong, Anand Bhattad, Yu-Xiong Wang, David Forsyth

Dense depth and surface normal predictors should possess the equivariant property to cropping-and-resizing -- cropping the input image should result in cropping the same output image.

Data Augmentation

Aligning Large Multimodal Models with Factually Augmented RLHF

no code implementations25 Sep 2023 Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, Kurt Keutzer, Trevor Darrell

Large Multimodal Models (LMM) are built across modalities and the misalignment between two modalities can result in "hallucination", generating textual outputs that are not grounded by the multimodal information in context.

Hallucination Image Captioning +1

Revisiting Deformable Convolution for Depth Completion

2 code implementations3 Aug 2023 Xinglong Sun, Jean Ponce, Yu-Xiong Wang

Our study reveals that, different from prior work, deformable convolution needs to be applied on an estimated depth map with a relatively high density for better performance.

Depth Completion

Semi-Supervised Object Detection in the Open World

no code implementations28 Jul 2023 Garvita Allabadi, Ana Lucic, Peter Pao-Huang, Yu-Xiong Wang, Vikram Adve

Existing approaches for semi-supervised object detection assume a fixed set of classes present in training and unlabeled datasets, i. e., in-distribution (ID) data.

Object object-detection +2

Is Pre-training Truly Better Than Meta-Learning?

no code implementations24 Jun 2023 Brando Miranda, Patrick Yu, Saumya Goyal, Yu-Xiong Wang, Sanmi Koyejo

Using this analysis, we demonstrate the following: 1. when the formal diversity of a data set is low, PT beats MAML on average and 2. when the formal diversity is high, MAML beats PT on average.

Few-Shot Learning

Stochastic Multi-Person 3D Motion Forecasting

1 code implementation8 Jun 2023 Sirui Xu, Yu-Xiong Wang, Liang-Yan Gui

This paper aims to deal with the ignored real-world complexities in prior work on human motion forecasting, emphasizing the social properties of multi-person motion, the diversity of motion and social interactions, and the complexity of articulated motion.

Motion Forecasting Stochastic Human Motion Prediction

Robust Model-Based Optimization for Challenging Fitness Landscapes

1 code implementation23 May 2023 Saba Ghaffari, Ehsan Saleh, Alexander G. Schwing, Yu-Xiong Wang, Martin D. Burke, Saurabh Sinha

Protein design, a grand challenge of the day, involves optimization on a fitness landscape, and leading methods adopt a model-based approach where a model is trained on a training set (protein sequences and fitness) and proposes candidates to explore next.

Benchmarking Protein Design

MV-Map: Offboard HD-Map Generation with Multi-view Consistency

1 code implementation ICCV 2023 Ziyang Xie, Ziqi Pang, Yu-Xiong Wang

To further enhance multi-view consistency, we augment the uncertainty network with the global 3D structure optimized by a voxelized neural radiance field (Voxel-NeRF).

DualCross: Cross-Modality Cross-Domain Adaptation for Monocular BEV Perception

no code implementations5 May 2023 Yunze Man, Liang-Yan Gui, Yu-Xiong Wang

In this work, we propose DualCross, a cross-modality cross-domain adaptation framework to facilitate the learning of a more robust monocular bird's-eye-view (BEV) perception model, which transfers the point cloud knowledge from a LiDAR sensor in one domain during the training phase to the camera-only testing scenario in a different domain.

Domain Adaptation

NeuralEditor: Editing Neural Radiance Fields via Manipulating Point Clouds

no code implementations CVPR 2023 Jun-Kun Chen, Jipeng Lyu, Yu-Xiong Wang

Our key insight is to exploit the explicit point cloud representation as the underlying structure to construct NeRFs, inspired by the intuitive interpretation of NeRF rendering as a process that projects or "plots" the associated 3D point cloud to a 2D image plane.

Novel View Synthesis

Contrastive Mean Teacher for Domain Adaptive Object Detectors

1 code implementation CVPR 2023 Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, Yu-Xiong Wang

Object detectors often suffer from the domain gap between training (source domain) and real-world applications (target domain).

Contrastive Learning Object +4

Object Discovery from Motion-Guided Tokens

2 code implementations CVPR 2023 Zhipeng Bao, Pavel Tokmakov, Yu-Xiong Wang, Adrien Gaidon, Martial Hebert

Object discovery -- separating objects from the background without manual labels -- is a fundamental open challenge in computer vision.

Object Object Discovery +2

Diverse Human Motion Prediction Guided by Multi-Level Spatial-Temporal Anchors

1 code implementation9 Feb 2023 Sirui Xu, Yu-Xiong Wang, Liang-Yan Gui

Predicting diverse human motions given a sequence of historical poses has received increasing attention.

 Ranked #1 on Human Pose Forecasting on Human3.6M (MMADE metric)

Human Pose Forecasting motion prediction +1

BEV-Guided Multi-Modality Fusion for Driving Perception

no code implementations CVPR 2023 Yunze Man, Liang-Yan Gui, Yu-Xiong Wang

We design a BEV-guided multi-sensor attention block to take queries from BEV embeddings and learn the BEV representation from sensor-specific features.

Autonomous Driving Representation Learning

Contrastive Learning Relies More on Spatial Inductive Bias Than Supervised Learning: An Empirical Study

no code implementations ICCV 2023 Yuanyi Zhong, Haoran Tang, Jun-Kun Chen, Yu-Xiong Wang

Though self-supervised contrastive learning (CL) has shown its potential to achieve state-of-the-art accuracy without any supervision, its behavior still remains under investigated by academia.

Contrastive Learning Inductive Bias

Video State-Changing Object Segmentation

1 code implementation ICCV 2023 Jiangwei Yu, Xiang Li, Xinran Zhao, Hongming Zhang, Yu-Xiong Wang

Learning about object state changes in Video Object Segmentation (VOS) is crucial for understanding and interacting with the visual world.

Object Representation Learning +4

Do Pre-trained Models Benefit Equally in Continual Learning?

1 code implementation27 Oct 2022 Kuan-Ying Lee, Yuanyi Zhong, Yu-Xiong Wang

Existing work on continual learning (CL) is primarily devoted to developing algorithms for models trained from scratch.

Continual Learning

Continual Learning with Evolving Class Ontologies

no code implementations10 Oct 2022 Zhiqiu Lin, Deepak Pathak, Yu-Xiong Wang, Deva Ramanan, Shu Kong

LECO requires learning classifiers in distinct time periods (TPs); each TP introduces a new ontology of "fine" labels that refines old ontologies of "coarse" labels (e. g., dog breeds that refine the previous ${\tt dog}$).

Class Incremental Learning Image Classification +3

PointTree: Transformation-Robust Point Cloud Encoder with Relaxed K-D Trees

1 code implementation11 Aug 2022 Jun-Kun Chen, Yu-Xiong Wang

Being able to learn an effective semantic representation directly on raw point clouds has become a central topic in 3D understanding.

Semantic Segmentation

The Curse of Low Task Diversity: On the Failure of Transfer Learning to Outperform MAML and Their Empirical Equivalence

no code implementations2 Aug 2022 Brando Miranda, Patrick Yu, Yu-Xiong Wang, Sanmi Koyejo

This novel insight contextualizes claims that transfer learning solutions are better than meta-learned solutions in the regime of low diversity under a fair comparison.

Few-Shot Learning Transfer Learning

Beyond RGB: Scene-Property Synthesis with Neural Radiance Fields

no code implementations9 Jun 2022 Mingtong Zhang, Shuhong Zheng, Zhipeng Bao, Martial Hebert, Yu-Xiong Wang

Comprehensive 3D scene understanding, both geometrically and semantically, is important for real-world applications such as robot perception.

Data Augmentation Edge Detection +5

Long-Tailed Recognition via Weight Balancing

1 code implementation CVPR 2022 Shaden Alshammari, Yu-Xiong Wang, Deva Ramanan, Shu Kong

In contrast, weight decay penalizes larger weights more heavily and so learns small balanced weights; the MaxNorm constraint encourages growing small weights within a norm ball but caps all the weights by the radius.

Classification Long-tail Learning

Discovering Objects that Can Move

1 code implementation CVPR 2022 Zhipeng Bao, Pavel Tokmakov, Allan Jabri, Yu-Xiong Wang, Adrien Gaidon, Martial Hebert

Our experiments demonstrate that, despite only capturing a small subset of the objects that move, this signal is enough to generalize to segment both moving and static instances of dynamic objects.

Motion Segmentation Object +1

The Curse of Zero Task Diversity: On the Failure of Transfer Learning to Outperform MAML and their Empirical Equivalence

no code implementations24 Dec 2021 Brando Miranda, Yu-Xiong Wang, Sanmi Koyejo

We hypothesize that the diversity coefficient of the few-shot learning benchmark is predictive of whether meta-learning solutions will succeed or not.

Few-Shot Learning Transfer Learning

Does MAML Only Work via Feature Re-use? A Data Centric Perspective

1 code implementation24 Dec 2021 Brando Miranda, Yu-Xiong Wang, Sanmi Koyejo

Recent work has suggested that a good embedding is all we need to solve many few-shot learning benchmarks.

Few-Shot Learning

Embracing Single Stride 3D Object Detector with Sparse Transformer

2 code implementations CVPR 2022 Lue Fan, Ziqi Pang, Tianyuan Zhang, Yu-Xiong Wang, Hang Zhao, Feng Wang, Naiyan Wang, Zhaoxiang Zhang

In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases.

3D Object Detection Autonomous Driving +3

Generative Modeling for Multitask Visual Learning

no code implementations29 Sep 2021 Zhipeng Bao, Yu-Xiong Wang, Martial Hebert

Generative modeling has recently shown great promise in computer vision, but it has mostly focused on synthesizing visually realistic images.

Multi-Task Learning

On the Importance of Distractors for Few-Shot Classification

1 code implementation ICCV 2021 Rajshekhar Das, Yu-Xiong Wang, JoséM. F. Moura

An effective approach to few-shot classification involves a prior model trained on a large-sample base domain, which is then finetuned over the novel few-shot task to yield generalizable representations.

Classification Contrastive Learning

Generative Modeling for Multi-task Visual Learning

no code implementations25 Jun 2021 Zhipeng Bao, Martial Hebert, Yu-Xiong Wang

Generative modeling has recently shown great promise in computer vision, but it has mostly focused on synthesizing visually realistic images.

Multi-Task Learning

Hallucination Improves Few-Shot Object Detection

no code implementations CVPR 2021 Weilin Zhang, Yu-Xiong Wang

One critical factor in improving few-shot detection is to address the lack of variation in training data.

Few-Shot Object Detection Hallucination +2

Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection

1 code implementation12 Apr 2021 Nadine Chang, Zhiding Yu, Yu-Xiong Wang, Anima Anandkumar, Sanja Fidler, Jose M. Alvarez

As a result, image resampling alone is not enough to yield a sufficiently balanced distribution at the object level.

Object

DAP: Detection-Aware Pre-training with Weak Supervision

1 code implementation CVPR 2021 Yuanyi Zhong, JianFeng Wang, Lijuan Wang, Jian Peng, Yu-Xiong Wang, Lei Zhang

This paper presents a detection-aware pre-training (DAP) approach, which leverages only weakly-labeled classification-style datasets (e. g., ImageNet) for pre-training, but is specifically tailored to benefit object detection tasks.

Classification General Classification +4

Cooperating RPN's Improve Few-Shot Object Detection

no code implementations19 Nov 2020 Weilin Zhang, Yu-Xiong Wang, David A. Forsyth

Learning to detect an object in an image from very few training examples - few-shot object detection - is challenging, because the classifier that sees proposal boxes has very little training data.

Few-Shot Object Detection Object +2

Few-Shot Learning with Intra-Class Knowledge Transfer

no code implementations22 Aug 2020 Vivek Roy, Yan Xu, Yu-Xiong Wang, Kris Kitani, Ruslan Salakhutdinov, Martial Hebert

Recent works have proposed to solve this task by augmenting the training data of the few-shot classes using generative models with the few-shot training samples as the seeds.

Few-Shot Learning Transfer Learning

AlphaNet: Improving Long-Tail Classification By Combining Classifiers

1 code implementation17 Aug 2020 Nadine Chang, Jayanth Koushik, Aarti Singh, Martial Hebert, Yu-Xiong Wang, Michael J. Tarr

Methods in long-tail learning focus on improving performance for data-poor (rare) classes; however, performance for such classes remains much lower than performance for more data-rich (frequent) classes.

Classification Long-tail Learning +1

Bowtie Networks: Generative Modeling for Joint Few-Shot Recognition and Novel-View Synthesis

1 code implementation ICLR 2021 Zhipeng Bao, Yu-Xiong Wang, Martial Hebert

We propose a novel task of joint few-shot recognition and novel-view synthesis: given only one or few images of a novel object from arbitrary views with only category annotation, we aim to simultaneously learn an object classifier and generate images of that type of object from new viewpoints.

Data Augmentation Multi-Task Learning +2

Towards Streaming Perception

1 code implementation ECCV 2020 Mengtian Li, Yu-Xiong Wang, Deva Ramanan

While past work has studied the algorithmic trade-off between latency and accuracy, there has not been a clear metric to compare different methods along the Pareto optimal latency-accuracy curve.

Instance Segmentation Motion Forecasting +5

Meta-Learning by Hallucinating Useful Examples

no code implementations25 Sep 2019 Yu-Xiong Wang, Yuki Uchiyama, Martial Hebert, Karteek Alahari

Learning to hallucinate additional examples has recently been shown as a promising direction to address few-shot learning tasks, which aim to learn novel concepts from very few examples.

Few-Shot Learning Hallucination +1

Growing a Brain: Fine-Tuning by Increasing Model Capacity

no code implementations CVPR 2017 Yu-Xiong Wang, Deva Ramanan, Martial Hebert

One of their remarkable properties is the ability to transfer knowledge from a large source dataset to a (typically smaller) target dataset.

Developmental Learning

Learning Compositional Representations for Few-Shot Recognition

no code implementations ICCV 2019 Pavel Tokmakov, Yu-Xiong Wang, Martial Hebert

One of the key limitations of modern deep learning approaches lies in the amount of data required to train them.

Attribute Few-Shot Learning

Few-Shot Human Motion Prediction via Meta-Learning

no code implementations ECCV 2018 Liang-Yan Gui, Yu-Xiong Wang, Deva Ramanan, Jose M. F. Moura

This paper addresses the problem of few-shot human motion prediction, in the spirit of the recent progress on few-shot learning and meta-learning.

Few-Shot Learning Human motion prediction +1

Adversarial Geometry-Aware Human Motion Prediction

no code implementations ECCV 2018 Liang-Yan Gui, Yu-Xiong Wang, Xiaodan Liang, Jose M. F. Moura

We explore an approach to forecasting human motion in a few milliseconds given an input 3D skeleton sequence based on a recurrent encoder-decoder framework.

Human motion prediction motion prediction

Low-Shot Learning from Imaginary Data

1 code implementation CVPR 2018 Yu-Xiong Wang, Ross Girshick, Martial Hebert, Bharath Hariharan

Humans can quickly learn new visual concepts, perhaps because they can easily visualize or imagine what novel objects look like from different views.

General Classification

Learning to Model the Tail

no code implementations NeurIPS 2017 Yu-Xiong Wang, Deva Ramanan, Martial Hebert

We cast this problem as transfer learning, where knowledge from the data-rich classes in the head of the distribution is transferred to the data-poor classes in the tail.

Image Classification Transfer Learning

Learning from Small Sample Sets by Combining Unsupervised Meta-Training with CNNs

no code implementations NeurIPS 2016 Yu-Xiong Wang, Martial Hebert

Inspired by the transferability properties of CNNs, we introduce an additional unsupervised meta-training stage that exposes multiple top layer units to a large amount of unlabeled real-world images.

Action Recognition General Classification +2

Model Recommendation: Generating Object Detectors From Few Samples

no code implementations CVPR 2015 Yu-Xiong Wang, Martial Hebert

In this paper, we explore an approach to generating detectors that is radically different from the conventional way of learning a detector from a large corpus of annotated positive and negative data samples.

Object

Cannot find the paper you are looking for? You can Submit a new open access paper.