no code implementations • NAACL (maiworkshop) 2021 • Han Ding, Li Erran Li, Zhiting Hu, Yi Xu, Dilek Hakkani-Tur, Zheng Du, Belinda Zeng
Recent vision-language understanding approaches adopt a multi-modal transformer pre-training and finetuning paradigm.
no code implementations • Findings (EMNLP) 2021 • Zhan Shi, Hui Liu, Martin Renqiang Min, Christopher Malon, Li Erran Li, Xiaodan Zhu
Image captioning systems are expected to have the ability to combine individual concepts when describing scenes with concept combinations that are not observed during training.
2 code implementations • 9 Oct 2024 • Manling Li, Shiyu Zhao, Qineng Wang, Kangrui Wang, Yu Zhou, Sanjana Srivastava, Cem Gokmen, Tony Lee, Li Erran Li, Ruohan Zhang, Weiyu Liu, Percy Liang, Li Fei-Fei, Jiayuan Mao, Jiajun Wu
We aim to evaluate Large Language Models (LLMs) for embodied decision making.
no code implementations • CVPR 2024 • Andong Wang, Bo Wu, Sunli Chen, Zhenfang Chen, Haotian Guan, Wei-Ning Lee, Li Erran Li, Chuang Gan
Learning commonsense reasoning from visual contexts and scenes in real-world is a crucial step toward advanced artificial intelligence.
no code implementations • 21 Mar 2024 • Nasim Rahaman, Martin Weiss, Manuel Wüthrich, Yoshua Bengio, Li Erran Li, Chris Pal, Bernhard Schölkopf
This work addresses the buyer's inspection paradox for information markets.
no code implementations • 19 Mar 2024 • Yao Wei, Martin Renqiang Min, George Vosselman, Li Erran Li, Michael Ying Yang
Recent progresses have been made in object shape generation with generative models such as diffusion models, which increases the shape fidelity.
no code implementations • 6 Mar 2024 • John Day, Tushar Arora, Jirui Liu, Li Erran Li, Ming Bo Cai
As part of human core knowledge, the representation of objects is the building block of mental representation that supports high-level concepts and symbolic reasoning.
1 code implementation • 9 Feb 2024 • Siming Yan, Min Bai, Weifeng Chen, Xiong Zhou, QiXing Huang, Li Erran Li
By combining natural language understanding, generation capabilities, and breadth of knowledge of large language models with image perception, recent large vision language models (LVLMs) have shown unprecedented visual reasoning capabilities.
no code implementations • 12 Jan 2024 • Shengyi Qian, Weifeng Chen, Min Bai, Xiong Zhou, Zhuowen Tu, Li Erran Li
Affordance grounding refers to the task of finding the area of an object with which one can interact.
no code implementations • 4 Oct 2023 • Chenwei Wu, Li Erran Li, Stefano Ermon, Patrick Haffner, Rong Ge, Zaiwei Zhang
Compositionality is a common property in many modalities including natural languages and images, but the compositional generalization of multi-modal models is not well-understood.
1 code implementation • 4 Sep 2023 • Zhuofan Xia, Xuran Pan, Shiji Song, Li Erran Li, Gao Huang
On the one hand, using dense attention in ViT leads to excessive memory and computational cost, and features can be influenced by irrelevant parts that are beyond the region of interests.
Ranked #4 on Object Detection on COCO 2017
1 code implementation • 31 Aug 2023 • Yanjie Ze, Ge Yan, Yueh-Hua Wu, Annabella Macaluso, Yuying Ge, Jianglong Ye, Nicklas Hansen, Li Erran Li, Xiaolong Wang
To incorporate semantics in 3D, the reconstruction module utilizes a vision-language foundation model ($\textit{e. g.}$, Stable Diffusion) to distill rich semantic information into the deep 3D voxel.
1 code implementation • ICCV 2023 • Eslam Mohamed BAKR, Pengzhan Sun, Xiaoqian Shen, Faizan Farooq Khan, Li Erran Li, Mohamed Elhoseiny
A human evaluation aligned with 95% of our evaluations on average was conducted to probe the effectiveness of HRS-Bench.
no code implementations • 10 Apr 2023 • Yingdong Hu, Renhao Wang, Li Erran Li, Yang Gao
Our study yields a series of intriguing results, including the discovery that the effectiveness of pre-training is highly dependent on the choice of the downstream policy learning algorithm.
no code implementations • 10 Apr 2023 • Eslam Mohamed BAKR, Pengzhan Sun, Li Erran Li, Mohamed Elhoseiny
In addition, we design a formulation for measuring the bias of generated captions as prompt-based image captioning instead of using language classifiers.
1 code implementation • 4 Apr 2023 • Haitao Yang, Zaiwei Zhang, Xiangru Huang, Min Bai, Chen Song, Bo Sun, Li Erran Li, QiXing Huang
Bird's-Eye View (BEV) features are popular intermediate scene representations shared by the 3D backbone and the detector head in LiDAR-based object detectors.
no code implementations • 4 Jan 2023 • Yuren Cong, Martin Renqiang Min, Li Erran Li, Bodo Rosenhahn, Michael Ying Yang
We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions.
1 code implementation • 16 Dec 2022 • Yi-Fan Zhang, HANLIN ZHANG, Li Erran Li, Eric Xing
Pre-trained language models (LMs) have shown remarkable reasoning performance using explanations or chain-of-thoughts (CoT)) for in-context learning.
no code implementations • 16 Dec 2022 • Dylan Sam, Min Bai, Tristan McKinney, Li Erran Li
Recent methods in self-supervised learning have demonstrated that masking-based pretext tasks extend beyond NLP, serving as useful pretraining objectives in computer vision.
no code implementations • CVPR 2023 • Yuying Ge, Annabella Macaluso, Li Erran Li, Ping Luo, Xiaolong Wang
When deploying the trained policy to a new task or a new environment, we first let the policy play with randomly generated instructions to record the demonstrations.
no code implementations • 4 Nov 2022 • Nasim Rahaman, Martin Weiss, Frederik Träuble, Francesco Locatello, Alexandre Lacoste, Yoshua Bengio, Chris Pal, Li Erran Li, Bernhard Schölkopf
Geospatial Information Systems are used by researchers and Humanitarian Assistance and Disaster Response (HADR) practitioners to support a wide variety of important applications.
no code implementations • 14 Oct 2022 • Nasim Rahaman, Martin Weiss, Francesco Locatello, Chris Pal, Yoshua Bengio, Bernhard Schölkopf, Li Erran Li, Nicolas Ballas
Recent work has seen the development of general purpose neural architectures that can be trained to perform tasks across diverse data modalities.
1 code implementation • 9 Jun 2022 • Deyao Zhu, Li Erran Li, Mohamed Elhoseiny
In some complex environments with continuous state-action spaces, sparse rewards, and/or long temporal horizons, learning a good policy in the original environments can be difficult.
1 code implementation • 2 Feb 2022 • Yi-Fan Zhang, HANLIN ZHANG, Zachary C. Lipton, Li Erran Li, Eric P. Xing
Previous works on Treatment Effect Estimation (TEE) are not in widespread use because they are predominantly theoretical, where strong parametric assumptions are made but untractable for practical application.
no code implementations • NeurIPS 2021 • Zhiting Hu, Li Erran Li
Controllable text generation concerns two fundamental tasks of wide applications, namely generating text of given attributes (i. e., attribute-conditional generation), and minimally editing existing text to possess desired attributes (i. e., text attribute transfer).
2 code implementations • CVPR 2022 • Zhuofan Xia, Xuran Pan, Shiji Song, Li Erran Li, Gao Huang
On the one hand, using dense attention e. g., in ViT, leads to excessive memory and computational cost, and features can be influenced by irrelevant parts which are beyond the region of interests.
Ranked #108 on Object Detection on COCO test-dev
no code implementations • NeurIPS Workshop SVRHM 2021 • Tushar Arora, Li Erran Li, Ming Bo Cai
Infants develop the notion of objects without supervision.
no code implementations • 29 Sep 2021 • Deyao Zhu, Li Erran Li, Mohamed Elhoseiny
Deep reinforcement learning agents trained in real-world environments with a limited diversity of object properties to learn manipulation tasks tend to suffer overfitting and fail to generalize to unseen testing environments.
1 code implementation • ICCV 2021 • Xuanchi Ren, Tao Yang, Li Erran Li, Alexandre Alahi, Qifeng Chen
The ability to predict unseen vehicles is critical for safety in autonomous driving.
no code implementations • CVPR 2021 • Zhenpei Yang, Li Erran Li, QiXing Huang
Monocular 3D prediction is one of the fundamental problems in 3D vision.
1 code implementation • CVPR 2021 • Kun Qian, Shilin Zhu, Xinyu Zhang, Li Erran Li
Vehicle detection with visual sensors like lidar and camera is one of the critical functions enabling autonomous driving.
no code implementations • 26 Mar 2021 • Mahdi Namazifar, John Malik, Li Erran Li, Gokhan Tur, Dilek Hakkani Tür
Masked language models have revolutionized natural language processing systems in the past few years.
no code implementations • ICLR 2021 • Jun Han, Martin Renqiang Min, Ligong Han, Li Erran Li, Xuan Zhang
Learning disentangled representations leads to interpretable models and facilitates data generation with style transfer, which has been extensively studied on static data such as images in an unsupervised learning framework.
no code implementations • 1 Jan 2021 • Deyao Zhu, Mohamed Zahran, Li Erran Li, Mohamed Elhoseiny
We propose a new objective, unlikelihood training, which forces generated trajectories that conflicts with contextual information to be assigned a lower probability by our model.
no code implementations • ICLR 2021 • Deyao Zhu, Mohamed Zahran, Li Erran Li, Mohamed Elhoseiny
Our model's learned representation leads to better and more semantically meaningful coverage of the trajectory distribution.
1 code implementation • CVPR 2021 • Xuran Pan, Zhuofan Xia, Shiji Song, Li Erran Li, Gao Huang
In this paper, we propose Pointformer, a Transformer backbone designed for 3D point clouds to learn features effectively.
1 code implementation • 30 Dec 2019 • Jiaxin Xie, Chenyang Lei, Zhuwen Li, Li Erran Li, Qifeng Chen
Our flow-to-depth layer is differentiable, and thus we can refine camera poses by maximizing the aggregated confidence in the camera pose refinement module.
1 code implementation • CVPR 2020 • Shuo Cheng, Zexiang Xu, Shilin Zhu, Zhuwen Li, Li Erran Li, Ravi Ramamoorthi, Hao Su
In contrast, we propose adaptive thin volumes (ATVs); in an ATV, the depth hypothesis of each plane is spatially varying, which adapts to the uncertainties of previous per-pixel depth predictions.
Ranked #14 on 3D Reconstruction on DTU
no code implementations • 21 May 2019 • Hengyu Zhao, Yubo Zhang, Pingfan Meng, Hui Shi, Li Erran Li, Tiancheng Lou, Jishen Zhao
To address this issue, we propose a `safety score' as a primary metric for measuring the level of safety in AV computing system design.
no code implementations • 27 Feb 2019 • Zhenyu Duan, Martin Renqiang Min, Li Erran Li, Mingbo Cai, Yi Xu, Bingbing Ni
In spite of achieving revolutionary successes in machine learning, deep convolutional neural networks have been recently found to be vulnerable to adversarial attacks and difficult to generalize to novel test images with reasonably large geometric transformations.
no code implementations • 16 May 2016 • Anand Padmanabha Iyer, Ion Stoica, Mosharaf Chowdhury, Li Erran Li
Our choice of this domain is influenced by its commonalities with several other domains that produce real-time data, our access to a large live dataset, and their real-time nature and dimensionality which makes it a natural fit for a popular analysis technique, machine learning (ML).