1 code implementation • ACL 2022 • Xingyu Fu, Ben Zhou, Ishaan Chandratreya, Carl Vondrick, Dan Roth
Images are often more significant than only the pixels to human eyes, as we can infer, associate, and reason with contextual information from other sources to establish a more complete picture.
no code implementations • 9 Jan 2025 • Xingyu Fu, Minqian Liu, Zhengyuan Yang, John Corring, Yijuan Lu, Jianwei Yang, Dan Roth, Dinei Florencio, Cha Zhang
ReFocus largely improves performance on all tasks over GPT-4o without visual editing, yielding an average gain of 11. 0% on table tasks and 6. 8% on chart tasks.
no code implementations • 17 Jun 2024 • Bangzheng Li, Ben Zhou, Xingyu Fu, Fei Wang, Dan Roth, Muhao Chen
One popular approach is using perplexity as a way to measure models' familiarity with the prompt.
no code implementations • 13 Jun 2024 • Yushi Hu, Weijia Shi, Xingyu Fu, Dan Roth, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, Ranjay Krishna
In this work, we introduce Sketchpad, a framework that gives multimodal LMs a visual sketchpad and tools to draw on the sketchpad.
1 code implementation • 13 Jun 2024 • Fei Wang, Xingyu Fu, James Y. Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, Muhao Chen
We introduce MuirBench, a comprehensive benchmark that focuses on robust multi-image understanding capabilities of multimodal LLMs.
no code implementations • 11 Jun 2024 • Xingyu Fu, Muyu He, Yujie Lu, William Yang Wang, Dan Roth
We present a novel task and benchmark for evaluating the ability of text-to-image(T2I) generation models to produce images that align with commonsense in real life, which we call Commonsense-T2I.
no code implementations • 18 Apr 2024 • Xingyu Fu, Yushi Hu, Bangzheng Li, Yu Feng, Haoyu Wang, Xudong Lin, Dan Roth, Noah A. Smith, Wei-Chiu Ma, Ranjay Krishna
We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations.
1 code implementation • 16 Nov 2023 • Bangzheng Li, Ben Zhou, Fei Wang, Xingyu Fu, Dan Roth, Muhao Chen
During the construction of the evidence, we purposefully replace semantic clues (entities) that may lead to the correct answer with distractor clues (evidence) that will not directly lead to the correct answer but require a chain-like reasoning process.
2 code implementations • 2 Oct 2023 • Max Ku, Tianle Li, Kai Zhang, Yujie Lu, Xingyu Fu, Wenwen Zhuang, Wenhu Chen
Recently, a myriad of conditional image generation and editing models have been developed to serve different downstream tasks, including text-to-image generation, text-guided image editing, subject-driven image generation, control-guided image generation, etc.
no code implementations • 31 Aug 2023 • Xingyu Fu, Mingze Xi
This paper proposes and validates a deep-learning based approach, that enables AR applications to accurately predict keystrokes from the user perspective RGB video stream that can be captured by any AR headset.
no code implementations • 30 May 2023 • Xingyu Fu, Sheng Zhang, Gukyeong Kwon, Pramuditha Perera, Henghui Zhu, Yuhao Zhang, Alexander Hanbo Li, William Yang Wang, Zhiguo Wang, Vittorio Castelli, Patrick Ng, Dan Roth, Bing Xiang
The open-ended Visual Question Answering (VQA) task requires AI models to jointly reason over visual and natural language inputs using world knowledge.
no code implementations • 24 May 2023 • Xingyu Fu, Ben Zhou, Sihao Chen, Mark Yatskar, Dan Roth
We propose the Dynamic Clue Bottleneck Model ( (DCLUB), a method that is designed towards an inherently interpretable VQA system.
1 code implementation • 1 Mar 2022 • Xingyu Fu, Ben Zhou, Ishaan Preetam Chandratreya, Carl Vondrick, Dan Roth
For example, in Figure 1, we can find a way to identify the news articles related to the picture through segment-wise understandings of the signs, the buildings, the crowds, and more.
1 code implementation • 30 Aug 2021 • Xingyu Fu, Fengfeng Zhou, Dheeraj Peddireddy, Zhengyang Kang, Martin Byung-Guk Jun, Vaneet Aggarwal
In this work, we present a Boundary Oriented Graph Embedding (BOGE) approach for the Graph Neural Network (GNN) to serve as a general surrogate model for regressing physical fields and solving boundary value problems.
1 code implementation • 2 Sep 2020 • Jiuniu Wang, Wenjia Xu, Xingyu Fu, Yang Wei, Li Jin, Ziyan Chen, Guangluan Xu, Yirong Wu
This model enhances the question answering system in the multi-document scenario from three aspects: model structure, optimization goal, and training method, corresponding to Multilayer Attention (MA), Cross Evidence (CE), and Adversarial Training (AT) respectively.
1 code implementation • 2 Sep 2020 • Jiuniu Wang, Wenjia Xu, Xingyu Fu, Guangluan Xu, Yirong Wu
Under such circumstances, how to make full use of the information extracted by word embedding requires more in-depth research.
1 code implementation • EMNLP 2020 • Xingyu Fu, Weijia Shi, Xiaodong Yu, Zian Zhao, Dan Roth
Cross-lingual Entity Linking (XEL), the problem of grounding mentions of entities in a foreign language text into an English knowledge base such as Wikipedia, has seen a lot of research in recent years, with a range of promising techniques.
no code implementations • WS 2019 • Ahmed El-Kishky, Xingyu Fu, Aseel Addawood, Nahil Sobh, Clare Voss, Jiawei Han
In this paper, we tackle the problem of {``}root extraction{''} from words in the Semitic language family.
1 code implementation • 27 Sep 2018 • Zheng Xie, Xingyu Fu, JinYuan Yu
In this project, we combine AlphaGo algorithm with Curriculum Learning to crack the game of Gomoku.
no code implementations • 3 Sep 2018 • Jiuniu Wang, Xingyu Fu, Guangluan Xu, Yirong Wu, Ziyan Chen, Yang Wei, Li Jin
Meanwhile, we construct A3Net for the WebQA dataset.
no code implementations • 5 Jun 2018 • XingYu Fu, JinHong Du, Yifeng Guo, Mingwen Liu, Tao Dong, XiuWen Duan
The effectiveness of the stock selection strategy is validated in Chinese stock market in both statistical and practical aspects, showing that: 1) Stacking outperforms other models reaching an AUC score of 0. 972; 2) Genetic Algorithm picks a subset of 114 features and the prediction performances of all models remain almost unchanged after the selection procedure, which suggests some features are indeed redundant; 3) LR and DNN are radical models; RF is risk-neutral model; Stacking is somewhere between DNN and RF.
no code implementations • 1 May 2018 • Yifeng Guo, Xingyu Fu, Yuyan Shi, Mingwen Liu
We proposed a new Portfolio Management method termed as Robust Log-Optimal Strategy (RLOS), which ameliorates the General Log-Optimal Strategy (GLOS) by approximating the traditional objective function with quadratic Taylor expansion.
no code implementations • 26 Feb 2018 • XingYu Fu, ZiYi Yang, XiuWen Duan
To model the randomness of language spreading, we propose the Batch Markov Monte Carlo Simulation with Migration(BMMCSM) algorithm, in which each agent is treated as a language stack.