Search Results for author: Tsu-Jui Fu

Found 32 papers, 18 papers with code

Dynamic Video Segmentation Network

no code implementations CVPR 2018 Yu-Syuan Xu, Tsu-Jui Fu, Hsuan-Kung Yang, Chun-Yi Lee

We explore the use of a decision network to adaptively assign different frame regions to different networks based on a metric called expected confidence score.

Segmentation Video Segmentation +1

Adversarial Active Exploration for Inverse Dynamics Model Learning

no code implementations ICLR 2019 Zhang-Wei Hong, Tsu-Jui Fu, Tzu-Yun Shann, Yi-Hsiang Chang, Chun-Yi Lee

Our framework consists of a deep reinforcement learning (DRL) agent and an inverse dynamics model contesting with each other.

Imitation Learning

Visual Relationship Prediction via Label Clustering and Incorporation of Depth Information

no code implementations9 Sep 2018 Hsuan-Kung Yang, An-Chieh Cheng, Kuan-Wei Ho, Tsu-Jui Fu, Chun-Yi Lee

The additional depth prediction path supplements the relationship prediction model in a way that bounding boxes or segmentation masks are unable to deliver.

Clustering Depth Estimation +5

Speed Reading: Learning to Read ForBackward via Shuttle

1 code implementation EMNLP 2018 Tsu-Jui Fu, Wei-Yun Ma

We present LSTM-Shuttle, which applies human speed reading techniques to natural language processing tasks for accurate and efficient comprehension.

Document Classification Document Summarization +8

GraphRel: Modeling Text as Relational Graphs for Joint Entity and Relation Extraction

1 code implementation ACL 2019 Tsu-Jui Fu, Peng-Hsuan Li, Wei-Yun Ma

In contrast to previous baselines, we consider the interaction between named entities and relations via a 2nd-phase relation-weighted GCN to better extract relations.

Joint Entity and Relation Extraction Relation

Why Attention? Analyze BiLSTM Deficiency and Its Remedies in the Case of NER

4 code implementations29 Aug 2019 Peng-Hsuan Li, Tsu-Jui Fu, Wei-Yun Ma

We test the practical impacts of the deficiency on real-world NER datasets, OntoNotes 5. 0 and WNUT 2017, with clear and consistent improvements over the baseline, up to 8. 7% on some of the multi-token entity mentions.

NER

Why Attention? Analyzing and Remedying BiLSTM Deficiency in Modeling Cross-Context for NER

no code implementations7 Oct 2019 Peng-Hsuan Li, Tsu-Jui Fu, Wei-Yun Ma

State-of-the-art approaches of NER have used sequence-labeling BiLSTM as a core module.

NER

Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling

no code implementations17 Nov 2019 Tsu-Jui Fu, Xin Eric Wang, Matthew Peterson, Scott Grafton, Miguel Eckstein, William Yang Wang

In particular, we present a model-agnostic adversarial path sampler (APS) that learns to sample challenging paths that force the navigator to improve based on the navigation performance.

counterfactual Counterfactual Reasoning +2

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

1 code implementation EACL 2021 Wanrong Zhu, Xin Eric Wang, Tsu-Jui Fu, An Yan, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang

Outdoor vision-and-language navigation (VLN) is such a task where an agent follows natural language instructions and navigates a real-life urban environment.

Ranked #4 on Vision and Language Navigation on Touchdown Dataset (using extra training data)

Style Transfer Text Style Transfer +1

SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning

1 code implementation EMNLP 2020 Tsu-Jui Fu, Xin Eric Wang, Scott Grafton, Miguel Eckstein, William Yang Wang

In this paper, we introduce a Self-Supervised Counterfactual Reasoning (SSCR) framework that incorporates counterfactual thinking to overcome data scarcity.

counterfactual Counterfactual Reasoning

H-FND: Hierarchical False-Negative Denoising for Distant Supervision Relation Extraction

1 code implementation Findings (ACL) 2021 Jhih-wei Chen, Tsu-Jui Fu, Chen-Kang Lee, Wei-Yun Ma

Experiments on SemEval-2010 and TACRED were conducted with controlled FN ratios that randomly turn the relations of training and validation instances into negatives to generate FN instances.

Denoising Relation +1

DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents

no code implementations28 Jan 2021 Tsu-Jui Fu, William Yang Wang, Daniel McDuff, Yale Song

Creating presentation materials requires complex multimodal reasoning skills to summarize key concepts and arrange them in a logical and visually pleasing manner.

Document Summarization Multimodal Reasoning +2

L2C: Describing Visual Differences Needs Semantic Understanding of Individuals

no code implementations EACL 2021 An Yan, Xin Eric Wang, Tsu-Jui Fu, William Yang Wang

Recent advances in language and vision push forward the research of captioning a single image to describing visual differences between image pairs.

Image Captioning

M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers

no code implementations CVPR 2022 Tsu-Jui Fu, Xin Eric Wang, Scott T. Grafton, Miguel P. Eckstein, William Yang Wang

LBVE contains two features: 1) the scenario of the source video is preserved instead of generating a completely different video; 2) the semantic is presented differently in the target video, and all changes are controlled by the given instruction.

Video Editing Video Understanding

Semi-Supervised Policy Initialization for Playing Games with Language Hints

1 code implementation NAACL 2021 Tsu-Jui Fu, William Yang Wang

Using natural language as a hint can supply an additional reward for playing sparse-reward games.

Language-Driven Image Style Transfer

1 code implementation1 Jun 2021 Tsu-Jui Fu, Xin Eric Wang, William Yang Wang

We propose contrastive language visual artist (CLVA) that learns to extract visual semantics from style instructions and accomplish LDAST by the patch-wise style discriminator.

Style Transfer

VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling

1 code implementation24 Nov 2021 Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu

Further, unlike previous studies that found pre-training tasks on video inputs (e. g., masked frame modeling) not very effective, we design a new pre-training task, Masked Visual-token Modeling (MVM), for better video modeling.

Question Answering Retrieval +5

ULN: Towards Underspecified Vision-and-Language Navigation

1 code implementation18 Oct 2022 Weixi Feng, Tsu-Jui Fu, Yujie Lu, William Yang Wang

Vision-and-Language Navigation (VLN) is a task to guide an embodied agent moving to a target position using language instructions.

Vision and Language Navigation

CPL: Counterfactual Prompt Learning for Vision and Language Models

no code implementations19 Oct 2022 Xuehai He, Diji Yang, Weixi Feng, Tsu-Jui Fu, Arjun Akula, Varun Jampani, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP.

counterfactual Visual Question Answering

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

1 code implementation9 Dec 2022 Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, William Yang Wang

In this work, we improve the compositional skills of T2I models, specifically more accurate attribute binding and better image compositions.

Attribute Image Generation

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation

no code implementations18 May 2023 Wanrong Zhu, Xinyi Wang, Yujie Lu, Tsu-Jui Fu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

We conduct a series of experiments to compare the common edits made by humans and GPT-k, evaluate the performance of GPT-k in prompting T2I, and examine factors that may influence this process.

Text Generation Text-to-Image Generation

EDIS: Entity-Driven Image Search over Multimodal Web Content

1 code implementation23 May 2023 SiQi Liu, Weixi Feng, Tsu-Jui Fu, Wenhu Chen, William Yang Wang

Making image retrieval methods practical for real-world search applications requires significant progress in dataset scales, entity comprehension, and multimodal information fusion.

Image Retrieval Retrieval

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

1 code implementation NeurIPS 2023 Weixi Feng, Wanrong Zhu, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, William Yang Wang

When combined with a downstream image generation model, LayoutGPT outperforms text-to-image models/systems by 20-40% and achieves comparable performance as human users in designing visual layouts for numerical and spatial correctness.

Indoor Scene Synthesis Text-to-Image Generation

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

1 code implementation12 Jul 2023 Raphael Schumann, Wanrong Zhu, Weixi Feng, Tsu-Jui Fu, Stefan Riezler, William Yang Wang

In this work, we propose VELMA, an embodied LLM agent that uses a verbalization of the trajectory and of visual environment observations as contextual prompt for the next action.

Decision Making Natural Language Understanding +1

Guiding Instruction-based Image Editing via Multimodal Large Language Models

2 code implementations29 Sep 2023 Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, Zhe Gan

Extensive experimental results demonstrate that expressive instructions are crucial to instruction-based image editing, and our MGIE can lead to a notable improvement in automatic metrics and human evaluation while maintaining competitive inference efficiency.

Image Manipulation Response Generation

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

no code implementations11 Apr 2024 Haotian Zhang, Haoxuan You, Philipp Dufter, BoWen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, Yinfei Yang

While Ferret seamlessly integrates regional understanding into the Large Language Model (LLM) to facilitate its referring and grounding capability, it poses certain limitations: constrained by the pre-trained fixed visual encoder and failed to perform well on broader tasks.

Language Modelling Large Language Model +1

Counterfactual Vision-and-Language Navigation via Adversarial Path Sampler

no code implementations ECCV 2020 Tsu-Jui Fu, Xin Eric Wang, Matthew F. Peterson,Scott T. Grafton, Miguel P. Eckstein, William Yang Wang

In particular, we present a model-agnostic adversarial path sampler (APS) that learns to sample challenging paths that force the navigator to improve based on the navigation performance.

counterfactual Counterfactual Reasoning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.