no code implementations • ECCV 2020 • Tsu-Jui Fu, Xin Eric Wang, Matthew F. Peterson,Scott T. Grafton, Miguel P. Eckstein, William Yang Wang
In particular, we present a model-agnostic adversarial path sampler (APS) that learns to sample challenging paths that force the navigator to improve based on the navigation performance.
1 code implementation • 9 Dec 2022 • Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, William Yang Wang
In this work, we improve the compositional skills of T2I models, specifically more accurate attribute binding and better image compositions.
1 code implementation • 23 Nov 2022 • Tsu-Jui Fu, Licheng Yu, Ning Zhang, Cheng-Yang Fu, Jong-Chyi Su, William Yang Wang, Sean Bell
Inspired by this, we introduce a novel task, text-guided video completion (TVC), which requests the model to generate a video from partial frames guided by an instruction.
Ranked #3 on
Video Prediction
on BAIR Robot Pushing
no code implementations • 19 Oct 2022 • Xuehai He, Diji Yang, Weixi Feng, Tsu-Jui Fu, Arjun Akula, Varun Jampani, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang
Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP.
1 code implementation • 18 Oct 2022 • Weixi Feng, Tsu-Jui Fu, Yujie Lu, William Yang Wang
Vision-and-Language Navigation (VLN) is a task to guide an embodied agent moving to a target position using language instructions.
1 code implementation • 4 Sep 2022 • Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu
Masked visual modeling (MVM) has been recently proven effective for visual pre-training.
Ranked #1 on
Video Question Answering
on LSMDC-MC
1 code implementation • 24 Nov 2021 • Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu
Further, unlike previous studies that found pre-training tasks on video inputs (e. g., masked frame modeling) not very effective, we design a new pre-training task, Masked Visual-token Modeling (MVM), for better video modeling.
1 code implementation • 1 Jun 2021 • Tsu-Jui Fu, Xin Eric Wang, William Yang Wang
We propose contrastive language visual artist (CLVA) that learns to extract visual semantics from style instructions and accomplish LDAST by the patch-wise style discriminator.
1 code implementation • NAACL 2021 • Tsu-Jui Fu, William Yang Wang
Using natural language as a hint can supply an additional reward for playing sparse-reward games.
no code implementations • CVPR 2022 • Tsu-Jui Fu, Xin Eric Wang, Scott T. Grafton, Miguel P. Eckstein, William Yang Wang
LBVE contains two features: 1) the scenario of the source video is preserved instead of generating a completely different video; 2) the semantic is presented differently in the target video, and all changes are controlled by the given instruction.
no code implementations • EACL 2021 • An Yan, Xin Eric Wang, Tsu-Jui Fu, William Yang Wang
Recent advances in language and vision push forward the research of captioning a single image to describing visual differences between image pairs.
no code implementations • 28 Jan 2021 • Tsu-Jui Fu, William Yang Wang, Daniel McDuff, Yale Song
Creating presentation materials requires complex multimodal reasoning skills to summarize key concepts and arrange them in a logical and visually pleasing manner.
1 code implementation • Findings (ACL) 2021 • Jhih-wei Chen, Tsu-Jui Fu, Chen-Kang Lee, Wei-Yun Ma
Experiments on SemEval-2010 and TACRED were conducted with controlled FN ratios that randomly turn the relations of training and validation instances into negatives to generate FN instances.
1 code implementation • EMNLP 2020 • Tsu-Jui Fu, Xin Eric Wang, Scott Grafton, Miguel Eckstein, William Yang Wang
In this paper, we introduce a Self-Supervised Counterfactual Reasoning (SSCR) framework that incorporates counterfactual thinking to overcome data scarcity.
1 code implementation • EACL 2021 • Wanrong Zhu, Xin Eric Wang, Tsu-Jui Fu, An Yan, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang
Outdoor vision-and-language navigation (VLN) is such a task where an agent follows natural language instructions and navigates a real-life urban environment.
Ranked #4 on
Vision and Language Navigation
on Touchdown Dataset
(using extra training data)
no code implementations • 17 Nov 2019 • Tsu-Jui Fu, Xin Eric Wang, Matthew Peterson, Scott Grafton, Miguel Eckstein, William Yang Wang
In particular, we present a model-agnostic adversarial path sampler (APS) that learns to sample challenging paths that force the navigator to improve based on the navigation performance.
no code implementations • 7 Oct 2019 • Peng-Hsuan Li, Tsu-Jui Fu, Wei-Yun Ma
State-of-the-art approaches of NER have used sequence-labeling BiLSTM as a core module.
4 code implementations • 29 Aug 2019 • Peng-Hsuan Li, Tsu-Jui Fu, Wei-Yun Ma
We test the practical impacts of the deficiency on real-world NER datasets, OntoNotes 5. 0 and WNUT 2017, with clear and consistent improvements over the baseline, up to 8. 7% on some of the multi-token entity mentions.
Ranked #17 on
Named Entity Recognition (NER)
on WNUT 2017
1 code implementation • ACL 2019 • Tsu-Jui Fu, Peng-Hsuan Li, Wei-Yun Ma
In contrast to previous baselines, we consider the interaction between named entities and relations via a 2nd-phase relation-weighted GCN to better extract relations.
no code implementations • ICLR 2019 • Zhang-Wei Hong, Tsu-Jui Fu, Tzu-Yun Shann, Yi-Hsiang Chang, Chun-Yi Lee
Our framework consists of a deep reinforcement learning (DRL) agent and an inverse dynamics model contesting with each other.
1 code implementation • EMNLP 2018 • Tsu-Jui Fu, Wei-Yun Ma
We present LSTM-Shuttle, which applies human speed reading techniques to natural language processing tasks for accurate and efficient comprehension.
no code implementations • 9 Sep 2018 • Hsuan-Kung Yang, An-Chieh Cheng, Kuan-Wei Ho, Tsu-Jui Fu, Chun-Yi Lee
The additional depth prediction path supplements the relationship prediction model in a way that bounding boxes or segmentation masks are unable to deliver.
no code implementations • ICLR 2019 • Zhang-Wei Hong, Tsu-Jui Fu, Tzu-Yun Shann, Yi-Hsiang Chang, Chun-Yi Lee
Our framework consists of a deep reinforcement learning (DRL) agent and an inverse dynamics model contesting with each other.
no code implementations • CVPR 2018 • Yu-Syuan Xu, Tsu-Jui Fu, Hsuan-Kung Yang, Chun-Yi Lee
We explore the use of a decision network to adaptively assign different frame regions to different networks based on a metric called expected confidence score.