Search Results for author: Wanrong Zhu

Found 26 papers, 16 papers with code

MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension

1 code implementation6 Jul 2024 Zekun Li, Xianjun Yang, Kyuri Choi, Wanrong Zhu, Ryan Hsieh, HyeonJung Kim, Jin Hyuk Lim, Sungyoung Ji, Byungju Lee, Xifeng Yan, Linda Ruth Petzold, Stephen D. Wilson, Woosang Lim, William Yang Wang

The rapid advancement of Large Language Models (LLMs) and Large Multimodal Models (LMMs) has heightened the demand for AI-based scientific assistants capable of understanding scientific articles and figures.

visual instruction following

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

1 code implementation12 Jun 2024 Xuehai He, Weixi Feng, Kaizhi Zheng, Yujie Lu, Wanrong Zhu, Jiachen Li, Yue Fan, JianFeng Wang, Linjie Li, Zhengyuan Yang, Kevin Lin, William Yang Wang, Lijuan Wang, Xin Eric Wang

Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics.

counterfactual Future prediction +1

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

1 code implementation25 Apr 2024 An Yan, Zhengyuan Yang, Junda Wu, Wanrong Zhu, Jianwei Yang, Linjie Li, Kevin Lin, JianFeng Wang, Julian McAuley, Jianfeng Gao, Lijuan Wang

Set-of-Mark (SoM) Prompting unleashes the visual grounding capability of GPT-4V, by enabling the model to associate visual objects with tags inserted on the image.

Visual Grounding Visual Question Answering +1

Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models

no code implementations23 Apr 2024 Wanrong Zhu, Jennifer Healey, Ruiyi Zhang, William Yang Wang, Tong Sun

Recent advancements in instruction-following models have made user interactions with models more user-friendly and efficient, broadening their applicability.

Instruction Following

High Confidence Level Inference is Almost Free using Parallel Stochastic Optimization

no code implementations17 Jan 2024 Wanrong Zhu, Zhipeng Lou, Ziyang Wei, Wei Biao Wu

We provide a rigorous theoretical guarantee for the confidence interval, demonstrating that the coverage is approximately exact with an explicit convergence rate and allowing for high confidence level inference.

Stochastic Optimization Uncertainty Quantification

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

1 code implementation12 Aug 2023 Yonatan Bitton, Hritik Bansal, Jack Hessel, Rulin Shao, Wanrong Zhu, Anas Awadalla, Josh Gardner, Rohan Taori, Ludwig Schmidt

These descriptions enable 1) collecting human-verified reference outputs for each instance; and 2) automatic evaluation of candidate multimodal generations using a text-only LLM, aligning with human judgment.

Instruction Following

Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality

no code implementations13 Jul 2023 Ziyang Wei, Wanrong Zhu, Wei Biao Wu

Stochastic Gradient Descent (SGD) is one of the simplest and most popular algorithms in modern statistical and machine learning due to its computational and memory efficiency.

valid

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

1 code implementation12 Jul 2023 Raphael Schumann, Wanrong Zhu, Weixi Feng, Tsu-Jui Fu, Stefan Riezler, William Yang Wang

In this work, we propose VELMA, an embodied LLM agent that uses a verbalization of the trajectory and of visual environment observations as contextual prompt for the next action.

Decision Making Natural Language Understanding +1

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

1 code implementation NeurIPS 2023 Weixi Feng, Wanrong Zhu, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, William Yang Wang

When combined with a downstream image generation model, LayoutGPT outperforms text-to-image models/systems by 20-40% and achieves comparable performance as human users in designing visual layouts for numerical and spatial correctness.

Indoor Scene Synthesis Text-to-Image Generation

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation

no code implementations18 May 2023 Wanrong Zhu, Xinyi Wang, Yujie Lu, Tsu-Jui Fu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

We conduct a series of experiments to compare the common edits made by humans and GPT-k, evaluate the performance of GPT-k in prompting T2I, and examine factors that may influence this process.

Text Generation Text-to-Image Generation

Multimodal Procedural Planning via Dual Text-Image Prompting

1 code implementation2 May 2023 Yujie Lu, Pan Lu, Zhiyu Chen, Wanrong Zhu, Xin Eric Wang, William Yang Wang

The key challenges of MPP are to ensure the informativeness, temporal coherence, and accuracy of plans across modalities.

Informativeness Text-to-Image Generation

Neuro-Symbolic Procedural Planning with Commonsense Prompting

no code implementations6 Jun 2022 Yujie Lu, Weixi Feng, Wanrong Zhu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

Procedural planning aims to implement complex high-level goals by decomposition into sequential simpler low-level steps.

Graph Sampling

End-to-end Dense Video Captioning as Sequence Generation

no code implementations COLING 2022 Wanrong Zhu, Bo Pang, Ashish V. Thapliyal, William Yang Wang, Radu Soricut

Dense video captioning aims to identify the events of interest in an input video, and generate descriptive captions for each event.

Ranked #3 on Dense Video Captioning on ViTT (CIDEr metric, using extra training data)

Dense Video Captioning Descriptive

Imagination-Augmented Natural Language Understanding

1 code implementation NAACL 2022 Yujie Lu, Wanrong Zhu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

Human brains integrate linguistic and perceptual information simultaneously to understand natural language, and hold the critical ability to render imaginations.

Natural Language Understanding

ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation

no code implementations10 Jun 2021 Wanrong Zhu, Xin Eric Wang, An Yan, Miguel Eckstein, William Yang Wang

Automatic evaluations for natural language generation (NLG) conventionally rely on token-level or embedding-level comparisons with text references.

nlg evaluation Text Generation

Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations

no code implementations EMNLP 2020 Wanrong Zhu, Xin Eric Wang, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang

A major challenge in visually grounded language generation is to build robust benchmark datasets and models that can generalize well in real-world settings.

Text Generation

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

1 code implementation EACL 2021 Wanrong Zhu, Xin Eric Wang, Tsu-Jui Fu, An Yan, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang

Outdoor vision-and-language navigation (VLN) is such a task where an agent follows natural language instructions and navigates a real-life urban environment.

Ranked #5 on Vision and Language Navigation on Touchdown Dataset (using extra training data)

Style Transfer Text Style Transfer +1

Online Covariance Matrix Estimation in Stochastic Gradient Descent

no code implementations10 Feb 2020 Wanrong Zhu, Xi Chen, Wei Biao Wu

This approach fits in an online setting and takes full advantage of SGD: efficiency in computation and memory.

valid

Text Infilling

1 code implementation1 Jan 2019 Wanrong Zhu, Zhiting Hu, Eric Xing

Recent years have seen remarkable progress of text generation in different contexts, such as the most common setting of generating text from scratch, and the emerging paradigm of retrieval-and-rewriting.

Retrieval Sentence +1

Cannot find the paper you are looking for? You can Submit a new open access paper.