no code implementations • EMNLP 2021 • Zi-Yi Dou, Nanyun Peng
Phrase grounding aims to map textual phrases to their associated image regions, which can be a prerequisite for multimodal reasoning and can benefit tasks requiring identifying objects based on language.
no code implementations • 10 Oct 2024 • WenBo Hu, Jia-Chen Gu, Zi-Yi Dou, Mohsen Fayyaz, Pan Lu, Kai-Wei Chang, Nanyun Peng
In this paper, we introduce a multimodal retrieval-augmented generation benchmark, MRAG-Bench, in which we systematically identify and categorize scenarios where visually augmented knowledge is better than textual knowledge, for instance, more images from varying viewpoints.
no code implementations • 7 Aug 2024 • Zi-Yi Dou, Xitong Yang, Tushar Nagarajan, Huiyu Wang, Jing Huang, Nanyun Peng, Kris Kitani, Fu-Jen Chu
We present EMBED (Egocentric Models Built with Exocentric Data), a method designed to transform exocentric video-language data for egocentric video representation learning.
1 code implementation • 3 Jun 2024 • Zi-Yi Dou, Cheng-Fu Yang, Xueqing Wu, Kai-Wei Chang, Nanyun Peng
Finetuning language agents with reasoning-action trajectories is effective, but obtaining these trajectories from human annotations or stronger models is costly and sometimes impractical.
1 code implementation • 29 May 2024 • WenBo Hu, Zi-Yi Dou, Liunian Harold Li, Amita Kamath, Nanyun Peng, Kai-Wei Chang
This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resources?
no code implementations • 27 Apr 2024 • Masoud Monajatipoor, Zi-Yi Dou, Aichi Chien, Nanyun Peng, Kai-Wei Chang
Vision-language models have become increasingly powerful for tasks that require an understanding of both visual and linguistic elements, bridging the gap between these modalities.
1 code implementation • 22 Apr 2024 • Haoyi Qiu, WenBo Hu, Zi-Yi Dou, Nanyun Peng
Our work also highlights the critical balance between faithfulness and coverage of model outputs, and encourages future works to address hallucinations in LVLMs while keeping their outputs informative.
1 code implementation • 2 Nov 2023 • Te-Lin Wu, Zi-Yi Dou, Qingyuan Hu, Yu Hou, Nischal Reddy Chandra, Marjorie Freedman, Ralph M. Weischedel, Nanyun Peng
Multimodal counterfactual reasoning is a vital yet challenging ability for AI systems.
1 code implementation • 24 May 2023 • Haoyi Qiu, Zi-Yi Dou, Tianlu Wang, Asli Celikyilmaz, Nanyun Peng
Model-based evaluation metrics (e. g., CLIPScore and GPTScore) have demonstrated decent correlations with human judgments in various language generation tasks.
no code implementations • 23 May 2023 • Zi-Yi Dou, Feng Gao, Nanyun Peng
In this paper, we introduce a masked path modeling (MPM) objective, which pretrains an agent using self-collected data for downstream navigation tasks.
1 code implementation • CVPR 2023 • Xueyan Zou, Zi-Yi Dou, Jianwei Yang, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, JianFeng Wang, Lu Yuan, Nanyun Peng, Lijuan Wang, Yong Jae Lee, Jianfeng Gao
We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly.
Ranked #4 on Instance Segmentation on ADE20K val (using extra training data)
1 code implementation • NeurIPS 2022 • Zi-Yi Dou, Aishwarya Kamath, Zhe Gan, Pengchuan Zhang, JianFeng Wang, Linjie Li, Zicheng Liu, Ce Liu, Yann Lecun, Nanyun Peng, Jianfeng Gao, Lijuan Wang
Vision-language (VL) pre-training has recently received considerable attention.
Ranked #1 on Phrase Grounding on Flickr30k Entities Dev
1 code implementation • NAACL 2022 • Zi-Yi Dou, Nanyun Peng
The speaker-follower models have proven to be effective in vision-and-language navigation, where a speaker model is used to synthesize new instructions to augment the training data for a follower navigation model.
1 code implementation • 1 Jan 2022 • Zi-Yi Dou, Nanyun Peng
In this paper, we instead focus on better utilizing the \textit{implicit knowledge} stored in pre-trained language models.
3 code implementations • CVPR 2022 • Zi-Yi Dou, Yichong Xu, Zhe Gan, JianFeng Wang, Shuohang Wang, Lijuan Wang, Chenguang Zhu, Pengchuan Zhang, Lu Yuan, Nanyun Peng, Zicheng Liu, Michael Zeng
Vision-and-language (VL) pre-training has proven to be highly effective on various VL downstream tasks.
Ranked #20 on Cross-Modal Retrieval on COCO 2014 (using extra training data)
1 code implementation • NAACL 2021 • Yixin Liu, Zi-Yi Dou, PengFei Liu
Although some recent works show potential complementarity among different state-of-the-art systems, few works try to investigate this problem in text summarization.
1 code implementation • ACL 2021 • PengFei Liu, Jinlan Fu, Yang Xiao, Weizhe Yuan, Shuaicheng Chang, Junqi Dai, Yixin Liu, Zihuiwen Ye, Zi-Yi Dou, Graham Neubig
In this paper, we present a new conceptualization and implementation of NLP evaluation: the ExplainaBoard, which in addition to inheriting the functionality of the standard leaderboard, also allows researchers to (i) diagnose strengths and weaknesses of a single system (e. g.~what is the best-performing system bad at?)
3 code implementations • EACL 2021 • Zi-Yi Dou, Graham Neubig
In addition, we demonstrate that we are able to train multilingual word aligners that can obtain robust performance on different language pairs.
1 code implementation • NAACL 2021 • Zi-Yi Dou, PengFei Liu, Hiroaki Hayashi, Zhengbao Jiang, Graham Neubig
Neural abstractive summarization models are flexible and can produce coherent summaries, but they are sometimes unfaithful and can be difficult to control.
2 code implementations • Findings of the Association for Computational Linguistics 2020 • Yiran Chen, PengFei Liu, Ming Zhong, Zi-Yi Dou, Danqing Wang, Xipeng Qiu, Xuanjing Huang
In this paper, we perform an in-depth analysis of characteristics of different datasets and investigate the performance of different summarization models under a cross-dataset setting, in which a summarizer trained on one corpus will be evaluated on a range of out-of-domain corpora.
no code implementations • EMNLP (NLP-COVID19) 2020 • Antonios Anastasopoulos, Alessandro Cattelan, Zi-Yi Dou, Marcello Federico, Christian Federman, Dmitriy Genzel, Francisco Guzmán, Junjie Hu, Macduff Hughes, Philipp Koehn, Rosie Lazar, Will Lewis, Graham Neubig, Mengmeng Niu, Alp Öktem, Eric Paquin, Grace Tang, Sylwia Tur
Further, the team is converting the test and development data into translation memories (TMXs) that can be used by localizers from and to any of the languages.
1 code implementation • WS 2020 • Zi-Yi Dou, Sachin Kumar, Yulia Tsvetkov
The model uses reinforcement learning to directly optimize a bilingual semantic similarity metric between the summaries generated in a target language and gold summaries in a source language.
1 code implementation • EMNLP 2020 • Zi-Yi Dou, Antonios Anastasopoulos, Graham Neubig
Back-translation has proven to be an effective method to utilize monolingual data in neural machine translation (NMT), and iteratively conducting back-translation can further improve the model performance.
1 code implementation • WS 2019 • Zi-Yi Dou, Xinyi Wang, Junjie Hu, Graham Neubig
We then use these learned domain differentials to adapt models for the target task accordingly.
no code implementations • IJCNLP 2019 • Zi-Yi Dou, Keyi Yu, Antonios Anastasopoulos
Learning general representations of text is a fundamental problem for many natural language understanding (NLU) tasks.
1 code implementation • IJCNLP 2019 • Zi-Yi Dou, Junjie Hu, Antonios Anastasopoulos, Graham Neubig
The recent success of neural machine translation models relies on the availability of high quality, in-domain data.
no code implementations • NAACL 2019 • Jian Li, Baosong Yang, Zi-Yi Dou, Xing Wang, Michael R. Lyu, Zhaopeng Tu
Multi-head attention is appealing for its ability to jointly extract different types of information from multiple representation subspaces.
2 code implementations • NAACL 2019 • Graham Neubig, Zi-Yi Dou, Junjie Hu, Paul Michel, Danish Pruthi, Xinyi Wang, John Wieting
In this paper, we describe compare-mt, a tool for holistic analysis and comparison of the results of systems for language generation tasks such as machine translation.
no code implementations • 15 Feb 2019 • Zi-Yi Dou, Zhaopeng Tu, Xing Wang, Long-Yue Wang, Shuming Shi, Tong Zhang
With the promising progress of deep neural networks, layer aggregation has been used to fuse information across layers in various fields, such as computer vision and machine translation.
no code implementations • EMNLP 2018 • Zi-Yi Dou, Zhaopeng Tu, Xing Wang, Shuming Shi, Tong Zhang
Advanced neural machine translation (NMT) models generally implement encoder and decoder as multiple layers, which allows systems to model complex functions and capture complicated linguistic structures.
no code implementations • EMNLP 2018 • Zi-Yi Dou, Zhi-Hao Zhou, Shu-Jian Huang
Bilingual lexicon extraction has been studied for decades and most previous methods have relied on parallel corpora or bilingual dictionaries.
2 code implementations • ECCV 2018 • Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, Joseph E. Gonzalez
While deeper convolutional networks are needed to achieve maximum accuracy in visual perception tasks, for many inputs shallower networks are sufficient.
no code implementations • 8 Nov 2017 • Zi-Yi Dou
Generative Adversarial Networks (GANs), as a framework for estimating generative models via an adversarial process, have attracted huge attention and have proven to be powerful in a variety of tasks.
no code implementations • LREC 2018 • Zi-Yi Dou, Hao Zhou, Shu-Jian Huang, Xin-yu Dai, Jia-Jun Chen
However, there are certain limitations in Scheduled Sampling and we propose two dynamic oracle-based methods to improve it.
no code implementations • EMNLP 2017 • Zi-Yi Dou
Document-level sentiment classification is a fundamental problem which aims to predict a user{'}s overall sentiment about a product in a document.
Ranked #8 on Sentiment Analysis on User and product information