Search Results for author: Tsu-Jui Fu

Found 32 papers, 18 papers with code

Dynamic Video Segmentation Network

no code implementations • CVPR 2018 • Yu-Syuan Xu, Tsu-Jui Fu, Hsuan-Kung Yang, Chun-Yi Lee

We explore the use of a decision network to adaptively assign different frame regions to different networks based on a metric called expected confidence score.

Segmentation Video Segmentation +1

Paper
Add Code

Adversarial Active Exploration for Inverse Dynamics Model Learning

no code implementations • ICLR 2019 • Zhang-Wei Hong, Tsu-Jui Fu, Tzu-Yun Shann, Yi-Hsiang Chang, Chun-Yi Lee

Our framework consists of a deep reinforcement learning (DRL) agent and an inverse dynamics model contesting with each other.

Imitation Learning

Paper
Add Code

Visual Relationship Prediction via Label Clustering and Incorporation of Depth Information

no code implementations • 9 Sep 2018 • Hsuan-Kung Yang, An-Chieh Cheng, Kuan-Wei Ho, Tsu-Jui Fu, Chun-Yi Lee

The additional depth prediction path supplements the relationship prediction model in a way that bounding boxes or segmentation masks are unable to deliver.

Clustering Depth Estimation +5

Paper
Add Code

Speed Reading: Learning to Read ForBackward via Shuttle

1 code implementation • EMNLP 2018 • Tsu-Jui Fu, Wei-Yun Ma

We present LSTM-Shuttle, which applies human speed reading techniques to natural language processing tasks for accurate and efficient comprehension.

Document Classification Document Summarization +8

Paper
Code

Adversarial Exploration Strategy for Self-Supervised Imitation Learning

no code implementations • ICLR 2019 • Zhang-Wei Hong, Tsu-Jui Fu, Tzu-Yun Shann, Yi-Hsiang Chang, Chun-Yi Lee

Our framework consists of a deep reinforcement learning (DRL) agent and an inverse dynamics model contesting with each other.

Imitation Learning OpenAI Gym

Paper
Add Code

GraphRel: Modeling Text as Relational Graphs for Joint Entity and Relation Extraction

1 code implementation • ACL 2019 • Tsu-Jui Fu, Peng-Hsuan Li, Wei-Yun Ma

In contrast to previous baselines, we consider the interaction between named entities and relations via a 2nd-phase relation-weighted GCN to better extract relations.

Joint Entity and Relation Extraction Relation

265

Paper
Code

Why Attention? Analyze BiLSTM Deficiency and Its Remedies in the Case of NER

4 code implementations • 29 Aug 2019 • Peng-Hsuan Li, Tsu-Jui Fu, Wei-Yun Ma

We test the practical impacts of the deficiency on real-world NER datasets, OntoNotes 5. 0 and WNUT 2017, with clear and consistent improvements over the baseline, up to 8. 7% on some of the multi-token entity mentions.

Ranked #19 on Named Entity Recognition (NER) on WNUT 2017

NER

1,615

Paper
Code

Why Attention? Analyzing and Remedying BiLSTM Deficiency in Modeling Cross-Context for NER

no code implementations • 7 Oct 2019 • Peng-Hsuan Li, Tsu-Jui Fu, Wei-Yun Ma

State-of-the-art approaches of NER have used sequence-labeling BiLSTM as a core module.

NER

Paper
Add Code

Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling

no code implementations • 17 Nov 2019 • Tsu-Jui Fu, Xin Eric Wang, Matthew Peterson, Scott Grafton, Miguel Eckstein, William Yang Wang

In particular, we present a model-agnostic adversarial path sampler (APS) that learns to sample challenging paths that force the navigator to improve based on the navigation performance.

counterfactual Counterfactual Reasoning +2

Paper
Add Code

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

1 code implementation • EACL 2021 • Wanrong Zhu, Xin Eric Wang, Tsu-Jui Fu, An Yan, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang

Outdoor vision-and-language navigation (VLN) is such a task where an agent follows natural language instructions and navigates a real-life urban environment.

Ranked #4 on Vision and Language Navigation on Touchdown Dataset (using extra training data)

Style Transfer Text Style Transfer +1

Paper
Code

SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning

1 code implementation • EMNLP 2020 • Tsu-Jui Fu, Xin Eric Wang, Scott Grafton, Miguel Eckstein, William Yang Wang

In this paper, we introduce a Self-Supervised Counterfactual Reasoning (SSCR) framework that incorporates counterfactual thinking to overcome data scarcity.

counterfactual Counterfactual Reasoning

Paper
Code

H-FND: Hierarchical False-Negative Denoising for Distant Supervision Relation Extraction

1 code implementation • Findings (ACL) 2021 • Jhih-wei Chen, Tsu-Jui Fu, Chen-Kang Lee, Wei-Yun Ma

Experiments on SemEval-2010 and TACRED were conducted with controlled FN ratios that randomly turn the relations of training and validation instances into negatives to generate FN instances.

Denoising Relation +1

Paper
Code

DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents

no code implementations • 28 Jan 2021 • Tsu-Jui Fu, William Yang Wang, Daniel McDuff, Yale Song

Creating presentation materials requires complex multimodal reasoning skills to summarize key concepts and arrange them in a logical and visually pleasing manner.

Document Summarization Multimodal Reasoning +2

Paper
Add Code

L2C: Describing Visual Differences Needs Semantic Understanding of Individuals

no code implementations • EACL 2021 • An Yan, Xin Eric Wang, Tsu-Jui Fu, William Yang Wang

Recent advances in language and vision push forward the research of captioning a single image to describing visual differences between image pairs.

Image Captioning

Paper
Add Code

M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers

no code implementations • CVPR 2022 • Tsu-Jui Fu, Xin Eric Wang, Scott T. Grafton, Miguel P. Eckstein, William Yang Wang

LBVE contains two features: 1) the scenario of the source video is preserved instead of generating a completely different video; 2) the semantic is presented differently in the target video, and all changes are controlled by the given instruction.

Video Editing Video Understanding

Paper
Add Code

Semi-Supervised Policy Initialization for Playing Games with Language Hints

1 code implementation • NAACL 2021 • Tsu-Jui Fu, William Yang Wang

Using natural language as a hint can supply an additional reward for playing sparse-reward games.

Paper
Code

Language-Driven Image Style Transfer

1 code implementation • 1 Jun 2021 • Tsu-Jui Fu, Xin Eric Wang, William Yang Wang

We propose contrastive language visual artist (CLVA) that learns to extract visual semantics from style instructions and accomplish LDAST by the patch-wise style discriminator.

Style Transfer

Paper
Code

VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling

1 code implementation • 24 Nov 2021 • Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu

Further, unlike previous studies that found pre-training tasks on video inputs (e. g., masked frame modeling) not very effective, we design a new pre-training task, Masked Visual-token Modeling (MVM), for better video modeling.

Ranked #20 on Zero-Shot Video Retrieval on DiDeMo

Question Answering Retrieval +5

136

Paper
Code

An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling

1 code implementation • CVPR 2023 • Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu

Masked visual modeling (MVM) has been recently proven effective for visual pre-training.

Ranked #1 on Video Question Answering on LSMDC-MC

Fill Mask Optical Flow Estimation +10

Paper
Code

ULN: Towards Underspecified Vision-and-Language Navigation

1 code implementation • 18 Oct 2022 • Weixi Feng, Tsu-Jui Fu, Yujie Lu, William Yang Wang

Vision-and-Language Navigation (VLN) is a task to guide an embodied agent moving to a target position using language instructions.

Vision and Language Navigation

Paper
Code

CPL: Counterfactual Prompt Learning for Vision and Language Models

no code implementations • 19 Oct 2022 • Xuehai He, Diji Yang, Weixi Feng, Tsu-Jui Fu, Arjun Akula, Varun Jampani, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP.

counterfactual Visual Question Answering

Paper
Add Code

Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation

1 code implementation • CVPR 2023 • Tsu-Jui Fu, Licheng Yu, Ning Zhang, Cheng-Yang Fu, Jong-Chyi Su, William Yang Wang, Sean Bell

Inspired by this, we introduce a novel task, text-guided video completion (TVC), which requests the model to generate a video from partial frames guided by an instruction.

Ranked #3 on Video Prediction on BAIR Robot Pushing

Text-to-Video Generation Video Generation +1

Paper
Code

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

1 code implementation • 9 Dec 2022 • Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, William Yang Wang

In this work, we improve the compositional skills of T2I models, specifically more accurate attribute binding and better image compositions.

Attribute Image Generation

292

Paper
Code

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation

no code implementations • 18 May 2023 • Wanrong Zhu, Xinyi Wang, Yujie Lu, Tsu-Jui Fu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

We conduct a series of experiments to compare the common edits made by humans and GPT-k, evaluate the performance of GPT-k in prompting T2I, and examine factors that may influence this process.

Text Generation Text-to-Image Generation

Paper
Add Code

Discriminative Diffusion Models as Few-shot Vision and Language Learners

1 code implementation • 18 May 2023 • Xuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation.

Image-text matching Text Matching +1

Paper
Code

Text-guided 3D Human Generation from 2D Collections

no code implementations • 23 May 2023 • Tsu-Jui Fu, Wenhan Xiong, Yixin Nie, Jingyu Liu, Barlas Oğuz, William Yang Wang

To address this \texttt{T3H} task, we propose Compositional Cross-modal Human (CCH).

Ranked #1 on Text-to-3D-Human Generation on SHHQ

text-to-3d-human Text-to-3D-Human Generation

Paper
Add Code

EDIS: Entity-Driven Image Search over Multimodal Web Content

1 code implementation • 23 May 2023 • SiQi Liu, Weixi Feng, Tsu-Jui Fu, Wenhu Chen, William Yang Wang

Making image retrieval methods practical for real-world search applications requires significant progress in dataset scales, entity comprehension, and multimodal information fusion.

Image Retrieval Retrieval

Paper
Code

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

1 code implementation • NeurIPS 2023 • Weixi Feng, Wanrong Zhu, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, William Yang Wang

When combined with a downstream image generation model, LayoutGPT outperforms text-to-image models/systems by 20-40% and achieves comparable performance as human users in designing visual layouts for numerical and spatial correctness.

Indoor Scene Synthesis Text-to-Image Generation

235

Paper
Code

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

1 code implementation • 12 Jul 2023 • Raphael Schumann, Wanrong Zhu, Weixi Feng, Tsu-Jui Fu, Stefan Riezler, William Yang Wang

In this work, we propose VELMA, an embodied LLM agent that uses a verbalization of the trajectory and of visual environment observations as contextual prompt for the next action.

Decision Making Natural Language Understanding +1

Paper
Code

Guiding Instruction-based Image Editing via Multimodal Large Language Models

2 code implementations • 29 Sep 2023 • Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, Zhe Gan

Extensive experimental results demonstrate that expressive instructions are crucial to instruction-based image editing, and our MGIE can lead to a notable improvement in automatic metrics and human evaluation while maintaining competitive inference efficiency.

Image Manipulation Response Generation

3,740

Paper
Code

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

no code implementations • 11 Apr 2024 • Haotian Zhang, Haoxuan You, Philipp Dufter, BoWen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, Yinfei Yang

While Ferret seamlessly integrates regional understanding into the Large Language Model (LLM) to facilitate its referring and grounding capability, it poses certain limitations: constrained by the pre-trained fixed visual encoder and failed to perform well on broader tasks.

Ranked #56 on Visual Question Answering on MM-Vet

Language Modelling Large Language Model +1

Paper
Add Code

Counterfactual Vision-and-Language Navigation via Adversarial Path Sampler

no code implementations • ECCV 2020 • Tsu-Jui Fu, Xin Eric Wang, Matthew F. Peterson,Scott T. Grafton, Miguel P. Eckstein, William Yang Wang

In particular, we present a model-agnostic adversarial path sampler (APS) that learns to sample challenging paths that force the navigator to improve based on the navigation performance.

counterfactual Counterfactual Reasoning +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.