Search Results for author: Li Erran Li

Found 40 papers, 13 papers with code

Retrieval, Analogy, and Composition: A framework for Compositional Generalization in Image Captioning

no code implementations • Findings (EMNLP) 2021 • Zhan Shi, Hui Liu, Martin Renqiang Min, Christopher Malon, Li Erran Li, Xiaodan Zhu

Image captioning systems are expected to have the ability to combine individual concepts when describing scenes with concept combinations that are not observed during training.

Image Captioning Retrieval

Paper
Add Code

Semantic Aligned Multi-modal Transformer for Vision-LanguageUnderstanding: A Preliminary Study on Visual QA

no code implementations • NAACL (maiworkshop) 2021 • Han Ding, Li Erran Li, Zhiting Hu, Yi Xu, Dilek Hakkani-Tur, Zheng Du, Belinda Zeng

Recent vision-language understanding approaches adopt a multi-modal transformer pre-training and finetuning paradigm.

Question Answering Visual Question Answering

Paper
Add Code

SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge

no code implementations • 15 May 2024 • Andong Wang, Bo Wu, Sunli Chen, Zhenfang Chen, Haotian Guan, Wei-Ning Lee, Li Erran Li, Chuang Gan

Learning commonsense reasoning from visual contexts and scenes in real-world is a crucial step toward advanced artificial intelligence.

General Knowledge Knowledge Graphs +1

Paper
Add Code

Language Models Can Reduce Asymmetry in Information Markets

no code implementations • 21 Mar 2024 • Nasim Rahaman, Martin Weiss, Manuel Wüthrich, Yoshua Bengio, Li Erran Li, Chris Pal, Bernhard Schölkopf

This work addresses the buyer's inspection paradox for information markets.

Paper
Add Code

Compositional 3D Scene Synthesis with Scene Graph Guided Layout-Shape Generation

no code implementations • 19 Mar 2024 • Yao Wei, Martin Renqiang Min, George Vosselman, Li Erran Li, Michael Ying Yang

Recent progresses have been made in shape generation with powerful generative models, such as diffusion models, which increases the shape fidelity.

3D Shape Generation Language Modelling +2

Paper
Add Code

Learning 3D object-centric representation through prediction

no code implementations • 6 Mar 2024 • John Day, Tushar Arora, Jirui Liu, Li Erran Li, Ming Bo Cai

As part of human core knowledge, the representation of objects is the building block of mental representation that supports high-level concepts and symbolic reasoning.

Object

Paper
Add Code

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

no code implementations • 9 Feb 2024 • Siming Yan, Min Bai, Weifeng Chen, Xiong Zhou, QiXing Huang, Li Erran Li

By combining natural language understanding, generation capabilities, and breadth of knowledge of large language models with image perception, recent large vision language models (LVLMs) have shown unprecedented visual reasoning capabilities.

Hallucination Natural Language Understanding +2

Paper
Add Code

AffordanceLLM: Grounding Affordance from Vision Language Models

no code implementations • 12 Jan 2024 • Shengyi Qian, Weifeng Chen, Min Bai, Xiong Zhou, Zhuowen Tu, Li Erran Li

Affordance grounding refers to the task of finding the area of an object with which one can interact.

Human-Object Interaction Detection Object

Paper
Add Code

The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-Language Models

no code implementations • 4 Oct 2023 • Chenwei Wu, Li Erran Li, Stefano Ermon, Patrick Haffner, Rong Ge, Zaiwei Zhang

Compositionality is a common property in many modalities including natural languages and images, but the compositional generalization of multi-modal models is not well-understood.

Paper
Add Code

DAT++: Spatially Dynamic Vision Transformer with Deformable Attention

1 code implementation • 4 Sep 2023 • Zhuofan Xia, Xuran Pan, Shiji Song, Li Erran Li, Gao Huang

On the one hand, using dense attention in ViT leads to excessive memory and computational cost, and features can be influenced by irrelevant parts that are beyond the region of interests.

Ranked #4 on Object Detection on COCO 2017

Image Classification Instance Segmentation +2

711

Paper
Code

GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields

1 code implementation • 31 Aug 2023 • Yanjie Ze, Ge Yan, Yueh-Hua Wu, Annabella Macaluso, Yuying Ge, Jianglong Ye, Nicklas Hansen, Li Erran Li, Xiaolong Wang

To incorporate semantics in 3D, the reconstruction module utilizes a vision-language foundation model ($\textit{e. g.}$, Stable Diffusion) to distill rich semantic information into the deep 3D voxel.

Decision Making

Paper
Code

HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models

1 code implementation • ICCV 2023 • Eslam Mohamed BAKR, Pengzhan Sun, Xiaoqian Shen, Faizan Farooq Khan, Li Erran Li, Mohamed Elhoseiny

A human evaluation aligned with 95% of our evaluations on average was conducted to probe the effectiveness of HRS-Bench.

Fairness Text-to-Image Generation

Paper
Code

ImageCaptioner$^2$: Image Captioner for Image Captioning Bias Amplification Assessment

no code implementations • 10 Apr 2023 • Eslam Mohamed BAKR, Pengzhan Sun, Li Erran Li, Mohamed Elhoseiny

In addition, we design a formulation for measuring the bias of generated captions as prompt-based image captioning instead of using language classifiers.

Image Captioning

Paper
Add Code

For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal

no code implementations • 10 Apr 2023 • Yingdong Hu, Renhao Wang, Li Erran Li, Yang Gao

Our study yields a series of intriguing results, including the discovery that the effectiveness of pre-training is highly dependent on the choice of the downstream policy learning algorithm.

Imitation Learning Reinforcement Learning (RL)

Paper
Add Code

LiDAR-Based 3D Object Detection via Hybrid 2D Semantic Scene Generation

1 code implementation • 4 Apr 2023 • Haitao Yang, Zaiwei Zhang, Xiangru Huang, Min Bai, Chen Song, Bo Sun, Li Erran Li, QiXing Huang

Bird's-Eye View (BEV) features are popular intermediate scene representations shared by the 3D backbone and the detector head in LiDAR-based object detectors.

3D Object Detection object-detection +1

Paper
Code

Attribute-Centric Compositional Text-to-Image Generation

no code implementations • 4 Jan 2023 • Yuren Cong, Martin Renqiang Min, Li Erran Li, Bodo Rosenhahn, Michael Ying Yang

We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions.

Attribute Fairness +1

Paper
Add Code

Evaluating Step-by-Step Reasoning through Symbolic Verification

1 code implementation • 16 Dec 2022 • Yi-Fan Zhang, HANLIN ZHANG, Li Erran Li, Eric Xing

Pre-trained language models (LMs) have shown remarkable reasoning performance using explanations or chain-of-thoughts (CoT)) for in-context learning.

In-Context Learning

Paper
Code

Improving self-supervised representation learning via sequential adversarial masking

no code implementations • 16 Dec 2022 • Dylan Sam, Min Bai, Tristan McKinney, Li Erran Li

Recent methods in self-supervised learning have demonstrated that masking-based pretext tasks extend beyond NLP, serving as useful pretraining objectives in computer vision.

Representation Learning Self-Supervised Learning

Paper
Add Code

Policy Adaptation from Foundation Model Feedback

no code implementations • CVPR 2023 • Yuying Ge, Annabella Macaluso, Li Erran Li, Ping Luo, Xiaolong Wang

When deploying the trained policy to a new task or a new environment, we first let the policy play with randomly generated instructions to record the demonstrations.

Decision Making

Paper
Add Code

A General Purpose Neural Architecture for Geospatial Systems

no code implementations • 4 Nov 2022 • Nasim Rahaman, Martin Weiss, Frederik Träuble, Francesco Locatello, Alexandre Lacoste, Yoshua Bengio, Chris Pal, Li Erran Li, Bernhard Schölkopf

Geospatial Information Systems are used by researchers and Humanitarian Assistance and Disaster Response (HADR) practitioners to support a wide variety of important applications.

Disaster Response Earth Observation +2

Paper
Add Code

Neural Attentive Circuits

no code implementations • 14 Oct 2022 • Nasim Rahaman, Martin Weiss, Francesco Locatello, Chris Pal, Yoshua Bengio, Bernhard Schölkopf, Li Erran Li, Nicolas Ballas

Recent work has seen the development of general purpose neural architectures that can be trained to perform tasks across diverse data modalities.

Point Cloud Classification text-classification +1

Paper
Add Code

Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning

1 code implementation • 9 Jun 2022 • Deyao Zhu, Li Erran Li, Mohamed Elhoseiny

In some complex environments with continuous state-action spaces, sparse rewards, and/or long temporal horizons, learning a good policy in the original environments can be difficult.

D4RL Model-based Reinforcement Learning +3

Paper
Code

Exploring Transformer Backbones for Heterogeneous Treatment Effect Estimation

1 code implementation • 2 Feb 2022 • Yi-Fan Zhang, HANLIN ZHANG, Zachary C. Lipton, Li Erran Li, Eric P. Xing

Previous works on Treatment Effect Estimation (TEE) are not in widespread use because they are predominantly theoretical, where strong parametric assumptions are made but untractable for practical application.

POS Selection bias

Paper
Code

A Causal Lens for Controllable Text Generation

no code implementations • NeurIPS 2021 • Zhiting Hu, Li Erran Li

Controllable text generation concerns two fundamental tasks of wide applications, namely generating text of given attributes (i. e., attribute-conditional generation), and minimally editing existing text to possess desired attributes (i. e., text attribute transfer).

Attribute Causal Inference +3

Paper
Add Code

Vision Transformer with Deformable Attention

2 code implementations • CVPR 2022 • Zhuofan Xia, Xuran Pan, Shiji Song, Li Erran Li, Gao Huang

On the one hand, using dense attention e. g., in ViT, leads to excessive memory and computational cost, and features can be influenced by irrelevant parts which are beyond the region of interests.

Ranked #107 on Object Detection on COCO test-dev

Image Classification Object Detection +1

711

Paper
Code

Learning to perceive objects by prediction

no code implementations • NeurIPS Workshop SVRHM 2021 • Tushar Arora, Li Erran Li, Ming Bo Cai

Infants develop the notion of objects without supervision.

Object Semantic Segmentation +1

Paper
Add Code

CausalDyna: Improving Generalization of Dyna-style Reinforcement Learning via Counterfactual-Based Data Augmentation

no code implementations • 29 Sep 2021 • Deyao Zhu, Li Erran Li, Mohamed Elhoseiny

Deep reinforcement learning agents trained in real-world environments with a limited diversity of object properties to learn manipulation tasks tend to suffer overfitting and fail to generalize to unseen testing environments.

counterfactual Data Augmentation +3

Paper
Add Code

Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving

1 code implementation • ICCV 2021 • Xuanchi Ren, Tao Yang, Li Erran Li, Alexandre Alahi, Qifeng Chen

The ability to predict unseen vehicles is critical for safety in autonomous driving.

Autonomous Driving motion prediction +1

Paper
Code

StruMonoNet: Structure-Aware Monocular 3D Prediction

no code implementations • CVPR 2021 • Zhenpei Yang, Li Erran Li, QiXing Huang

Monocular 3D prediction is one of the fundamental problems in 3D vision.

Paper
Add Code

Robust Multimodal Vehicle Detection in Foggy Weather Using Complementary Lidar and Radar Signals

1 code implementation • CVPR 2021 • Kun Qian, Shilin Zhu, Xinyu Zhang, Li Erran Li

Vehicle detection with visual sensors like lidar and camera is one of the critical functions enabling autonomous driving.

Autonomous Driving

Paper
Code

Correcting Automated and Manual Speech Transcription Errors using Warped Language Models

no code implementations • 26 Mar 2021 • Mahdi Namazifar, John Malik, Li Erran Li, Gokhan Tur, Dilek Hakkani Tür

Masked language models have revolutionized natural language processing systems in the past few years.

Language Modelling

Paper
Add Code

Disentangled Recurrent Wasserstein Autoencoder

no code implementations • ICLR 2021 • Jun Han, Martin Renqiang Min, Ligong Han, Li Erran Li, Xuan Zhang

Learning disentangled representations leads to interpretable models and facilitates data generation with style transfer, which has been extensively studied on static data such as images in an unsupervised learning framework.

Disentanglement Style Transfer +1

Paper
Add Code

Motion Forecasting with Unlikelihood Training

no code implementations • 1 Jan 2021 • Deyao Zhu, Mohamed Zahran, Li Erran Li, Mohamed Elhoseiny

We propose a new objective, unlikelihood training, which forces generated trajectories that conflicts with contextual information to be assigned a lower probability by our model.

Decoder Motion Forecasting +1

Paper
Add Code

HalentNet: Multimodal Trajectory Forecasting with Hallucinative Intents

no code implementations • ICLR 2021 • Deyao Zhu, Mohamed Zahran, Li Erran Li, Mohamed Elhoseiny

Our model's learned representation leads to better and more semantically meaningful coverage of the trajectory distribution.

Motion Forecasting Trajectory Forecasting

Paper
Add Code

3D Object Detection with Pointformer

1 code implementation • CVPR 2021 • Xuran Pan, Zhuofan Xia, Shiji Song, Li Erran Li, Gao Huang

In this paper, we propose Pointformer, a Transformer backbone designed for 3D point clouds to learn features effectively.

3D Object Detection Object +2

153

Paper
Code

Video Depth Estimation by Fusing Flow-to-Depth Proposals

1 code implementation • 30 Dec 2019 • Jiaxin Xie, Chenyang Lei, Zhuwen Li, Li Erran Li, Qifeng Chen

Our flow-to-depth layer is differentiable, and thus we can refine camera poses by maximizing the aggregated confidence in the camera pose refinement module.

Depth Estimation Optical Flow Estimation

Paper
Code

Deep Stereo using Adaptive Thin Volume Representation with Uncertainty Awareness

1 code implementation • CVPR 2020 • Shuo Cheng, Zexiang Xu, Shilin Zhu, Zhuwen Li, Li Erran Li, Ravi Ramamoorthi, Hao Su

In contrast, we propose adaptive thin volumes (ATVs); in an ATV, the depth hypothesis of each plane is spatially varying, which adapts to the uncertainties of previous per-pixel depth predictions.

Ranked #13 on 3D Reconstruction on DTU

3D Reconstruction Point Clouds

Paper
Code

Towards Safety-Aware Computing System Design in Autonomous Vehicles

no code implementations • 21 May 2019 • Hengyu Zhao, Yubo Zhang, Pingfan Meng, Hui Shi, Li Erran Li, Tiancheng Lou, Jishen Zhao

To address this issue, we propose a `safety score' as a primary metric for measuring the level of safety in AV computing system design.

Autonomous Driving Management

Paper
Add Code

Disentangled Deep Autoencoding Regularization for Robust Image Classification

no code implementations • 27 Feb 2019 • Zhenyu Duan, Martin Renqiang Min, Li Erran Li, Mingbo Cai, Yi Xu, Bingbing Ni

In spite of achieving revolutionary successes in machine learning, deep convolutional neural networks have been recently found to be vulnerable to adversarial attacks and difficult to generalize to novel test images with reasonably large geometric transformations.

Classification General Classification +2

Paper
Add Code

Fast and Accurate Performance Analysis of LTE Radio Access Networks

no code implementations • 16 May 2016 • Anand Padmanabha Iyer, Ion Stoica, Mosharaf Chowdhury, Li Erran Li

Our choice of this domain is influenced by its commonalities with several other domains that produce real-time data, our access to a large live dataset, and their real-time nature and dimensionality which makes it a natural fit for a popular analysis technique, machine learning (ML).

Feature Engineering Multi-Task Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.