Search Results for author: Li Erran Li

Found 41 papers, 15 papers with code

Retrieval, Analogy, and Composition: A framework for Compositional Generalization in Image Captioning

no code implementations Findings (EMNLP) 2021 Zhan Shi, Hui Liu, Martin Renqiang Min, Christopher Malon, Li Erran Li, Xiaodan Zhu

Image captioning systems are expected to have the ability to combine individual concepts when describing scenes with concept combinations that are not observed during training.

Image Captioning Retrieval

Planner3D: LLM-enhanced graph prior meets 3D indoor scene explicit regularization

no code implementations19 Mar 2024 Yao Wei, Martin Renqiang Min, George Vosselman, Li Erran Li, Michael Ying Yang

Recent progresses have been made in object shape generation with generative models such as diffusion models, which increases the shape fidelity.

3D Shape Generation Diversity +3

Learning 3D object-centric representation through prediction

no code implementations6 Mar 2024 John Day, Tushar Arora, Jirui Liu, Li Erran Li, Ming Bo Cai

As part of human core knowledge, the representation of objects is the building block of mental representation that supports high-level concepts and symbolic reasoning.

Object

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

1 code implementation9 Feb 2024 Siming Yan, Min Bai, Weifeng Chen, Xiong Zhou, QiXing Huang, Li Erran Li

By combining natural language understanding, generation capabilities, and breadth of knowledge of large language models with image perception, recent large vision language models (LVLMs) have shown unprecedented visual reasoning capabilities.

Hallucination Natural Language Understanding +2

The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-Language Models

no code implementations4 Oct 2023 Chenwei Wu, Li Erran Li, Stefano Ermon, Patrick Haffner, Rong Ge, Zaiwei Zhang

Compositionality is a common property in many modalities including natural languages and images, but the compositional generalization of multi-modal models is not well-understood.

DAT++: Spatially Dynamic Vision Transformer with Deformable Attention

1 code implementation4 Sep 2023 Zhuofan Xia, Xuran Pan, Shiji Song, Li Erran Li, Gao Huang

On the one hand, using dense attention in ViT leads to excessive memory and computational cost, and features can be influenced by irrelevant parts that are beyond the region of interests.

Image Classification Instance Segmentation +2

GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields

1 code implementation31 Aug 2023 Yanjie Ze, Ge Yan, Yueh-Hua Wu, Annabella Macaluso, Yuying Ge, Jianglong Ye, Nicklas Hansen, Li Erran Li, Xiaolong Wang

To incorporate semantics in 3D, the reconstruction module utilizes a vision-language foundation model ($\textit{e. g.}$, Stable Diffusion) to distill rich semantic information into the deep 3D voxel.

Decision Making

For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal

no code implementations10 Apr 2023 Yingdong Hu, Renhao Wang, Li Erran Li, Yang Gao

Our study yields a series of intriguing results, including the discovery that the effectiveness of pre-training is highly dependent on the choice of the downstream policy learning algorithm.

Imitation Learning Reinforcement Learning (RL)

ImageCaptioner$^2$: Image Captioner for Image Captioning Bias Amplification Assessment

no code implementations10 Apr 2023 Eslam Mohamed BAKR, Pengzhan Sun, Li Erran Li, Mohamed Elhoseiny

In addition, we design a formulation for measuring the bias of generated captions as prompt-based image captioning instead of using language classifiers.

Image Captioning

LiDAR-Based 3D Object Detection via Hybrid 2D Semantic Scene Generation

1 code implementation4 Apr 2023 Haitao Yang, Zaiwei Zhang, Xiangru Huang, Min Bai, Chen Song, Bo Sun, Li Erran Li, QiXing Huang

Bird's-Eye View (BEV) features are popular intermediate scene representations shared by the 3D backbone and the detector head in LiDAR-based object detectors.

3D Object Detection object-detection +1

Attribute-Centric Compositional Text-to-Image Generation

no code implementations4 Jan 2023 Yuren Cong, Martin Renqiang Min, Li Erran Li, Bodo Rosenhahn, Michael Ying Yang

We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions.

Attribute Fairness +1

Evaluating Step-by-Step Reasoning through Symbolic Verification

1 code implementation16 Dec 2022 Yi-Fan Zhang, HANLIN ZHANG, Li Erran Li, Eric Xing

Pre-trained language models (LMs) have shown remarkable reasoning performance using explanations or chain-of-thoughts (CoT)) for in-context learning.

In-Context Learning

Improving self-supervised representation learning via sequential adversarial masking

no code implementations16 Dec 2022 Dylan Sam, Min Bai, Tristan McKinney, Li Erran Li

Recent methods in self-supervised learning have demonstrated that masking-based pretext tasks extend beyond NLP, serving as useful pretraining objectives in computer vision.

Representation Learning Self-Supervised Learning

Policy Adaptation from Foundation Model Feedback

no code implementations CVPR 2023 Yuying Ge, Annabella Macaluso, Li Erran Li, Ping Luo, Xiaolong Wang

When deploying the trained policy to a new task or a new environment, we first let the policy play with randomly generated instructions to record the demonstrations.

Decision Making

A General Purpose Neural Architecture for Geospatial Systems

no code implementations4 Nov 2022 Nasim Rahaman, Martin Weiss, Frederik Träuble, Francesco Locatello, Alexandre Lacoste, Yoshua Bengio, Chris Pal, Li Erran Li, Bernhard Schölkopf

Geospatial Information Systems are used by researchers and Humanitarian Assistance and Disaster Response (HADR) practitioners to support a wide variety of important applications.

Disaster Response Diversity +3

Neural Attentive Circuits

no code implementations14 Oct 2022 Nasim Rahaman, Martin Weiss, Francesco Locatello, Chris Pal, Yoshua Bengio, Bernhard Schölkopf, Li Erran Li, Nicolas Ballas

Recent work has seen the development of general purpose neural architectures that can be trained to perform tasks across diverse data modalities.

Point Cloud Classification text-classification +1

Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning

1 code implementation9 Jun 2022 Deyao Zhu, Li Erran Li, Mohamed Elhoseiny

In some complex environments with continuous state-action spaces, sparse rewards, and/or long temporal horizons, learning a good policy in the original environments can be difficult.

D4RL Model-based Reinforcement Learning +3

Exploring Transformer Backbones for Heterogeneous Treatment Effect Estimation

1 code implementation2 Feb 2022 Yi-Fan Zhang, HANLIN ZHANG, Zachary C. Lipton, Li Erran Li, Eric P. Xing

Previous works on Treatment Effect Estimation (TEE) are not in widespread use because they are predominantly theoretical, where strong parametric assumptions are made but untractable for practical application.

POS Selection bias

A Causal Lens for Controllable Text Generation

no code implementations NeurIPS 2021 Zhiting Hu, Li Erran Li

Controllable text generation concerns two fundamental tasks of wide applications, namely generating text of given attributes (i. e., attribute-conditional generation), and minimally editing existing text to possess desired attributes (i. e., text attribute transfer).

Attribute Causal Inference +3

Vision Transformer with Deformable Attention

2 code implementations CVPR 2022 Zhuofan Xia, Xuran Pan, Shiji Song, Li Erran Li, Gao Huang

On the one hand, using dense attention e. g., in ViT, leads to excessive memory and computational cost, and features can be influenced by irrelevant parts which are beyond the region of interests.

Image Classification Object Detection +1

CausalDyna: Improving Generalization of Dyna-style Reinforcement Learning via Counterfactual-Based Data Augmentation

no code implementations29 Sep 2021 Deyao Zhu, Li Erran Li, Mohamed Elhoseiny

Deep reinforcement learning agents trained in real-world environments with a limited diversity of object properties to learn manipulation tasks tend to suffer overfitting and fail to generalize to unseen testing environments.

counterfactual Data Augmentation +5

Robust Multimodal Vehicle Detection in Foggy Weather Using Complementary Lidar and Radar Signals

1 code implementation CVPR 2021 Kun Qian, Shilin Zhu, Xinyu Zhang, Li Erran Li

Vehicle detection with visual sensors like lidar and camera is one of the critical functions enabling autonomous driving.

Autonomous Driving

Disentangled Recurrent Wasserstein Autoencoder

no code implementations ICLR 2021 Jun Han, Martin Renqiang Min, Ligong Han, Li Erran Li, Xuan Zhang

Learning disentangled representations leads to interpretable models and facilitates data generation with style transfer, which has been extensively studied on static data such as images in an unsupervised learning framework.

Disentanglement Style Transfer +1

Motion Forecasting with Unlikelihood Training

no code implementations1 Jan 2021 Deyao Zhu, Mohamed Zahran, Li Erran Li, Mohamed Elhoseiny

We propose a new objective, unlikelihood training, which forces generated trajectories that conflicts with contextual information to be assigned a lower probability by our model.

Decoder Motion Forecasting +1

3D Object Detection with Pointformer

1 code implementation CVPR 2021 Xuran Pan, Zhuofan Xia, Shiji Song, Li Erran Li, Gao Huang

In this paper, we propose Pointformer, a Transformer backbone designed for 3D point clouds to learn features effectively.

3D Object Detection Object +2

Video Depth Estimation by Fusing Flow-to-Depth Proposals

1 code implementation30 Dec 2019 Jiaxin Xie, Chenyang Lei, Zhuwen Li, Li Erran Li, Qifeng Chen

Our flow-to-depth layer is differentiable, and thus we can refine camera poses by maximizing the aggregated confidence in the camera pose refinement module.

Depth Estimation Optical Flow Estimation

Deep Stereo using Adaptive Thin Volume Representation with Uncertainty Awareness

1 code implementation CVPR 2020 Shuo Cheng, Zexiang Xu, Shilin Zhu, Zhuwen Li, Li Erran Li, Ravi Ramamoorthi, Hao Su

In contrast, we propose adaptive thin volumes (ATVs); in an ATV, the depth hypothesis of each plane is spatially varying, which adapts to the uncertainties of previous per-pixel depth predictions.

3D Reconstruction Point Clouds

Towards Safety-Aware Computing System Design in Autonomous Vehicles

no code implementations21 May 2019 Hengyu Zhao, Yubo Zhang, Pingfan Meng, Hui Shi, Li Erran Li, Tiancheng Lou, Jishen Zhao

To address this issue, we propose a `safety score' as a primary metric for measuring the level of safety in AV computing system design.

Autonomous Driving Management

Disentangled Deep Autoencoding Regularization for Robust Image Classification

no code implementations27 Feb 2019 Zhenyu Duan, Martin Renqiang Min, Li Erran Li, Mingbo Cai, Yi Xu, Bingbing Ni

In spite of achieving revolutionary successes in machine learning, deep convolutional neural networks have been recently found to be vulnerable to adversarial attacks and difficult to generalize to novel test images with reasonably large geometric transformations.

Classification General Classification +2

Fast and Accurate Performance Analysis of LTE Radio Access Networks

no code implementations16 May 2016 Anand Padmanabha Iyer, Ion Stoica, Mosharaf Chowdhury, Li Erran Li

Our choice of this domain is influenced by its commonalities with several other domains that produce real-time data, our access to a large live dataset, and their real-time nature and dimensionality which makes it a natural fit for a popular analysis technique, machine learning (ML).

Feature Engineering Multi-Task Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.