no code implementations • ECCV 2020 • Tsu-Jui Fu, Xin Eric Wang, Matthew F. Peterson,Scott T. Grafton, Miguel P. Eckstein, William Yang Wang
In particular, we present a model-agnostic adversarial path sampler (APS) that learns to sample challenging paths that force the navigator to improve based on the navigation performance.
no code implementations • 1 Jun 2023 • Jialu Wang, Xinyue Gabby Liu, Zonglin Di, Yang Liu, Xin Eric Wang
In this work, we seek to measure more complex human biases exist in the task of text-to-image generations.
no code implementations • 29 May 2023 • Jing Gu, Yilin Wang, Nanxuan Zhao, Tsu-Jui Fu, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung, Xin Eric Wang
In an era where images and visual content dominate our digital landscape, the ability to manipulate and personalize these images has become a necessity.
1 code implementation • 24 May 2023 • Weixi Feng, Wanrong Zhu, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, William Yang Wang
When combined with a downstream image generation model, LayoutGPT outperforms text-to-image models/systems by 20-40% and achieves comparable performance as human users in designing visual layouts for numerical and spatial correctness.
no code implementations • 23 May 2023 • Yue Fan, Kaizhi Zheng, Jing Gu, Xin Eric Wang
Furthermore, we propose a novel task-oriented multimodal response generation model that can see and respond, named SeeRee, as the navigation helper to guide the task performer in embodied tasks.
no code implementations • 18 May 2023 • Wanrong Zhu, Xinyi Wang, Yujie Lu, Tsu-Jui Fu, Xin Eric Wang, Miguel Eckstein, William Yang Wang
We conduct a series of experiments to compare the common edits made by humans and GPT-k, evaluate the performance of GPT-k in prompting T2I, and examine factors that may influence this process.
no code implementations • 18 May 2023 • Xuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang
Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation.
1 code implementation • 18 May 2023 • Yujie Lu, Xianjun Yang, Xiujun Li, Xin Eric Wang, William Yang Wang
Existing automatic evaluation on text-to-image synthesis can only provide an image-text matching score, without considering the object-level compositionality, which results in poor correlation with human judgments.
1 code implementation • 2 May 2023 • Yujie Lu, Pan Lu, Zhiyu Chen, Wanrong Zhu, Xin Eric Wang, William Yang Wang
The key challenges of MPP are to ensure the informativeness, temporal coherence, and accuracy of plans across modalities.
no code implementations • 2 May 2023 • Zhen Zhang, Jialu Wang, Xin Eric Wang
Pre-trained vision and language models such as CLIP have witnessed remarkable success in connecting images and texts with a primary focus on English texts.
no code implementations • 30 Apr 2023 • Xuehai He, Xin Eric Wang
Despite the success of Transformer models in vision and language tasks, they often learn knowledge from enormous data implicitly and cannot utilize structured input data directly.
no code implementations • 30 Jan 2023 • Kaiwen Zhou, Kaizhi Zheng, Connor Pryor, Yilin Shen, Hongxia Jin, Lise Getoor, Xin Eric Wang
Such object navigation tasks usually require large-scale training in visual environments with labeled objects, which generalizes poorly to novel objects in unknown environments.
1 code implementation • 9 Dec 2022 • Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, William Yang Wang
In this work, we improve the compositional skills of T2I models, specifically more accurate attribute binding and better image compositions.
no code implementations • 27 Nov 2022 • Yunchao Zhang, Zonglin Di, Kaiwen Zhou, Cihang Xie, Xin Eric Wang
However, since the local data is inaccessible to the server under federated learning, attackers may easily poison the training data of the local client to build a backdoor in the agent without notice.
no code implementations • 25 Nov 2022 • Kenan Jiang, Xuehai He, Ruize Xu, Xin Eric Wang
Contrastive Language-Image Pretraining (CLIP) has demonstrated great zero-shot performance for matching images and text.
no code implementations • 19 Oct 2022 • Xuehai He, Diji Yang, Weixi Feng, Tsu-Jui Fu, Arjun Akula, Varun Jampani, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang
Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP.
1 code implementation • 7 Oct 2022 • Wanrong Zhu, An Yan, Yujie Lu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang Wang
Recent advances in text-to-image synthesis make it possible to visualize machine imaginations for a given context.
no code implementations • 10 Sep 2022 • Yujie Lu, Huiliang Zhang, Ping Nie, Weixi Feng, Wenda Xu, Xin Eric Wang, William Yang Wang
In this paper, we propose an Unseen Discrepancy Anticipating Vision and Language Navigation (DAVIS) that learns to generalize to unseen environments via encouraging test-time visual consistency.
no code implementations • 28 Aug 2022 • Kaizhi Zheng, Kaiwen Zhou, Jing Gu, Yue Fan, Jialu Wang, Zonglin Di, Xuehai He, Xin Eric Wang
Building a conversational embodied agent to execute real-life tasks has been a long-standing yet quite challenging research goal, as it requires effective human-agent communication, multi-modal understanding, long-range sequential decision making, etc.
1 code implementation • 30 Jun 2022 • Jialu Wang, Xin Eric Wang, Yang Liu
A variety of fairness constraints have been proposed in the literature to mitigate group-level statistical bias.
no code implementations • 17 Jun 2022 • Kaizhi Zheng, Xiaotong Chen, Odest Chadwicke Jenkins, Xin Eric Wang
We hope the new simulator and benchmark will facilitate future research on language-guided robotic manipulation.
no code implementations • 6 Jun 2022 • Yujie Lu, Weixi Feng, Wanrong Zhu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang Wang
Procedural planning aims to implement complex high-level goals by decomposition into sequential simpler low-level steps.
2 code implementations • 24 May 2022 • Yue Fan, Winson Chen, Tongzhou Jiang, Chun Zhou, Yi Zhang, Xin Eric Wang
To this end, we introduce Aerial Vision-and-Dialog Navigation (AVDN), to navigate a drone via natural language conversation.
1 code implementation • NAACL 2022 • Yujie Lu, Wanrong Zhu, Xin Eric Wang, Miguel Eckstein, William Yang Wang
Human brains integrate linguistic and perceptual information simultaneously to understand natural language, and hold the critical ability to render imaginations.
2 code implementations • 29 Mar 2022 • Xuehai He, Chunyuan Li, Pengchuan Zhang, Jianwei Yang, Xin Eric Wang
In this paper, we aim to study parameter-efficient model adaptation strategies for vision transformers on the image classification task.
1 code implementation • 28 Mar 2022 • Kaiwen Zhou, Xin Eric Wang
Data privacy is a central problem for embodied agents that can perceive the environment, communicate with humans, and act in the real world.
no code implementations • Findings (ACL) 2022 • Tianyi Luo, Rui Meng, Xin Eric Wang, Yang Liu
Research Replication Prediction (RRP) is the task of predicting whether a published research result can be replicated or not.
1 code implementation • CVPR 2022 • Juncheng Li, Junlin Xie, Long Qian, Linchao Zhu, Siliang Tang, Fei Wu, Yi Yang, Yueting Zhuang, Xin Eric Wang
To systematically measure the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.
1 code implementation • ACL 2022 • Jing Gu, Eliana Stefani, Qi Wu, Jesse Thomason, Xin Eric Wang
A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks.
no code implementations • 2 Dec 2021 • Wenqiao Zhang, Xin Eric Wang, Siliang Tang, Haizhou Shi, Haocheng Shi, Jun Xiao, Yueting Zhuang, William Yang Wang
Such a setting can help explain the decisions of captioning models and prevents the model from hallucinating object words in its description.
1 code implementation • EMNLP 2021 • Jialu Wang, Yang Liu, Xin Eric Wang
Internet search affects people's cognition of the world, so mitigating biases in search results and learning fair models is imperative for social good.
1 code implementation • 21 Jun 2021 • Swati Jindal, Xin Eric Wang
However, adopting such generative models to new domains while maintaining their ability to provide fine-grained control over different image attributes, \eg, gaze and head pose directions, has been a challenging problem.
no code implementations • Findings (ACL) 2022 • Jialu Wang, Yang Liu, Xin Eric Wang
To answer these questions, we view language as the fairness recipient and introduce two new fairness notions, multilingual individual fairness and multilingual group fairness, for pre-trained multimodal models.
no code implementations • 10 Jun 2021 • Wanrong Zhu, Xin Eric Wang, An Yan, Miguel Eckstein, William Yang Wang
Automatic evaluations for natural language generation (NLG) conventionally rely on token-level or embedding-level comparisons with text references.
1 code implementation • 8 Jun 2021 • Linjie Li, Jie Lei, Zhe Gan, Licheng Yu, Yen-Chun Chen, Rohit Pillai, Yu Cheng, Luowei Zhou, Xin Eric Wang, William Yang Wang, Tamara Lee Berg, Mohit Bansal, Jingjing Liu, Lijuan Wang, Zicheng Liu
Most existing video-and-language (VidL) research focuses on a single dataset, or multiple datasets of a single task.
1 code implementation • 1 Jun 2021 • Tsu-Jui Fu, Xin Eric Wang, William Yang Wang
We propose contrastive language visual artist (CLVA) that learns to extract visual semantics from style instructions and accomplish LDAST by the patch-wise style discriminator.
no code implementations • CVPR 2022 • Tsu-Jui Fu, Xin Eric Wang, Scott T. Grafton, Miguel P. Eckstein, William Yang Wang
LBVE contains two features: 1) the scenario of the source video is preserved instead of generating a completely different video; 2) the semantic is presented differently in the target video, and all changes are controlled by the given instruction.
1 code implementation • NAACL 2022 • Wanrong Zhu, Yuankai Qi, Pradyumna Narayana, Kazoo Sone, Sugato Basu, Xin Eric Wang, Qi Wu, Miguel Eckstein, William Yang Wang
Results show that indoor navigation agents refer to both object and direction tokens when making decisions.
no code implementations • EACL 2021 • An Yan, Xin Eric Wang, Tsu-Jui Fu, William Yang Wang
Recent advances in language and vision push forward the research of captioning a single image to describing visual differences between image pairs.
no code implementations • EMNLP 2020 • Wanrong Zhu, Xin Eric Wang, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang
A major challenge in visually grounded language generation is to build robust benchmark datasets and models that can generalize well in real-world settings.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Jiannan Xiang, Xin Eric Wang, William Yang Wang
Vision-and-Language Navigation (VLN) is a natural language grounding task where an agent learns to follow language instructions and navigate to specified destinations in real-world environments.
Ranked #3 on
Vision and Language Navigation
on Touchdown Dataset
1 code implementation • EMNLP 2020 • Tsu-Jui Fu, Xin Eric Wang, Scott Grafton, Miguel Eckstein, William Yang Wang
In this paper, we introduce a Self-Supervised Counterfactual Reasoning (SSCR) framework that incorporates counterfactual thinking to overcome data scarcity.
1 code implementation • EACL 2021 • Wanrong Zhu, Xin Eric Wang, Tsu-Jui Fu, An Yan, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang
Outdoor vision-and-language navigation (VLN) is such a task where an agent follows natural language instructions and navigates a real-life urban environment.
Ranked #4 on
Vision and Language Navigation
on Touchdown Dataset
(using extra training data)
1 code implementation • ECCV 2020 • Xin Eric Wang, Vihan Jain, Eugene Ie, William Yang Wang, Zornitsa Kozareva, Sujith Ravi
Recent research efforts enable study for natural language grounded navigation in photo-realistic environments, e. g., following natural language instructions or dialog.
no code implementations • 17 Nov 2019 • Tsu-Jui Fu, Xin Eric Wang, Matthew Peterson, Scott Grafton, Miguel Eckstein, William Yang Wang
In particular, we present a model-agnostic adversarial path sampler (APS) that learns to sample challenging paths that force the navigator to improve based on the navigation performance.
2 code implementations • 24 Oct 2019 • An Yan, Xin Eric Wang, Jiangtao Feng, Lei LI, William Yang Wang
Commanding a robot to navigate with natural language instructions is a long-term goal for grounded language understanding and robotics.