Search Results for author: Wanrong Zhu

Found 22 papers, 13 papers with code

High Confidence Level Inference is Almost Free using Parallel Stochastic Optimization

no code implementations • 17 Jan 2024 • Wanrong Zhu, Zhipeng Lou, Ziyang Wei, Wei Biao Wu

We provide a rigorous theoretical guarantee for the confidence interval, demonstrating that the coverage is approximately exact with an explicit convergence rate and allowing for high confidence level inference.

Stochastic Optimization Uncertainty Quantification

Paper
Add Code

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

2 code implementations • 13 Nov 2023 • An Yan, Zhengyuan Yang, Wanrong Zhu, Kevin Lin, Linjie Li, JianFeng Wang, Jianwei Yang, Yiwu Zhong, Julian McAuley, Jianfeng Gao, Zicheng Liu, Lijuan Wang

We first benchmark MM-Navigator on our collected iOS screen dataset.

Action Localization

106

Paper
Code

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

1 code implementation • 12 Aug 2023 • Yonatan Bitton, Hritik Bansal, Jack Hessel, Rulin Shao, Wanrong Zhu, Anas Awadalla, Josh Gardner, Rohan Taori, Ludwig Schmidt

These descriptions enable 1) collecting human-verified reference outputs for each instance; and 2) automatic evaluation of candidate multimodal generations using a text-only LLM, aligning with human judgment.

Instruction Following

Paper
Code

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

2 code implementations • 2 Aug 2023 • Anas Awadalla, Irena Gao, Josh Gardner, Jack Hessel, Yusuf Hanafy, Wanrong Zhu, Kalyani Marathe, Yonatan Bitton, Samir Gadre, Shiori Sagawa, Jenia Jitsev, Simon Kornblith, Pang Wei Koh, Gabriel Ilharco, Mitchell Wortsman, Ludwig Schmidt

We introduce OpenFlamingo, a family of autoregressive vision-language models ranging from 3B to 9B parameters.

Ranked #14 on Visual Question Answering (VQA) on InfiMM-Eval

Visual Question Answering

3,449

Paper
Code

Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality

no code implementations • 13 Jul 2023 • Ziyang Wei, Wanrong Zhu, Wei Biao Wu

Stochastic Gradient Descent (SGD) is one of the simplest and most popular algorithms in modern statistical and machine learning due to its computational and memory efficiency.

valid

Paper
Add Code

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

1 code implementation • 12 Jul 2023 • Raphael Schumann, Wanrong Zhu, Weixi Feng, Tsu-Jui Fu, Stefan Riezler, William Yang Wang

In this work, we propose VELMA, an embodied LLM agent that uses a verbalization of the trajectory and of visual environment observations as contextual prompt for the next action.

Decision Making Natural Language Understanding +1

Paper
Code

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

1 code implementation • NeurIPS 2023 • Weixi Feng, Wanrong Zhu, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, William Yang Wang

When combined with a downstream image generation model, LayoutGPT outperforms text-to-image models/systems by 20-40% and achieves comparable performance as human users in designing visual layouts for numerical and spatial correctness.

Indoor Scene Synthesis Text-to-Image Generation

233

Paper
Code

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation

no code implementations • 18 May 2023 • Wanrong Zhu, Xinyi Wang, Yujie Lu, Tsu-Jui Fu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

We conduct a series of experiments to compare the common edits made by humans and GPT-k, evaluate the performance of GPT-k in prompting T2I, and examine factors that may influence this process.

Text Generation Text-to-Image Generation

Paper
Add Code

Multimodal Procedural Planning via Dual Text-Image Prompting

1 code implementation • 2 May 2023 • Yujie Lu, Pan Lu, Zhiyu Chen, Wanrong Zhu, Xin Eric Wang, William Yang Wang

The key challenges of MPP are to ensure the informativeness, temporal coherence, and accuracy of plans across modalities.

Informativeness Text-to-Image Generation

Paper
Code

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text

2 code implementations • NeurIPS 2023 • Wanrong Zhu, Jack Hessel, Anas Awadalla, Samir Yitzhak Gadre, Jesse Dodge, Alex Fang, Youngjae Yu, Ludwig Schmidt, William Yang Wang, Yejin Choi

We release Multimodal C4, an augmentation of the popular text-only C4 corpus with images interleaved.

Few-Shot Learning

857

Paper
Code

Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning

1 code implementation • NeurIPS 2023 • Xinyi Wang, Wanrong Zhu, Michael Saxon, Mark Steyvers, William Yang Wang

This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.

Few-Shot Learning GSM8K +5

Paper
Code

CLIP also Understands Text: Prompting CLIP for Phrase Understanding

no code implementations • 11 Oct 2022 • An Yan, Jiacheng Li, Wanrong Zhu, Yujie Lu, William Yang Wang, Julian McAuley

However, the application of its text encoder solely for text understanding has been less explored.

Clustering Transfer Learning

Paper
Add Code

Visualize Before You Write: Imagination-Guided Open-Ended Text Generation

1 code implementation • 7 Oct 2022 • Wanrong Zhu, An Yan, Yujie Lu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

Recent advances in text-to-image synthesis make it possible to visualize machine imaginations for a given context.

Concept-To-Text Generation Image Generation +1

Paper
Code

Neuro-Symbolic Procedural Planning with Commonsense Prompting

no code implementations • 6 Jun 2022 • Yujie Lu, Weixi Feng, Wanrong Zhu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

Procedural planning aims to implement complex high-level goals by decomposition into sequential simpler low-level steps.

Graph Sampling

Paper
Add Code

Imagination-Augmented Natural Language Understanding

1 code implementation • NAACL 2022 • Yujie Lu, Wanrong Zhu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

Human brains integrate linguistic and perceptual information simultaneously to understand natural language, and hold the critical ability to render imaginations.

Natural Language Understanding

Paper
Code

End-to-end Dense Video Captioning as Sequence Generation

no code implementations • COLING 2022 • Wanrong Zhu, Bo Pang, Ashish V. Thapliyal, William Yang Wang, Radu Soricut

Dense video captioning aims to identify the events of interest in an input video, and generate descriptive captions for each event.

Ranked #3 on Dense Video Captioning on ViTT (CIDEr metric, using extra training data)

Dense Video Captioning Descriptive

Paper
Add Code

ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation

no code implementations • 10 Jun 2021 • Wanrong Zhu, Xin Eric Wang, An Yan, Miguel Eckstein, William Yang Wang

Automatic evaluations for natural language generation (NLG) conventionally rely on token-level or embedding-level comparisons with text references.

nlg evaluation Text Generation

Paper
Add Code

Diagnosing Vision-and-Language Navigation: What Really Matters

1 code implementation • NAACL 2022 • Wanrong Zhu, Yuankai Qi, Pradyumna Narayana, Kazoo Sone, Sugato Basu, Xin Eric Wang, Qi Wu, Miguel Eckstein, William Yang Wang

Results show that indoor navigation agents refer to both object and direction tokens when making decisions.

Object Vision and Language Navigation

Paper
Code

Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations

no code implementations • EMNLP 2020 • Wanrong Zhu, Xin Eric Wang, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang

A major challenge in visually grounded language generation is to build robust benchmark datasets and models that can generalize well in real-world settings.

Text Generation

Paper
Add Code

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

1 code implementation • EACL 2021 • Wanrong Zhu, Xin Eric Wang, Tsu-Jui Fu, An Yan, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang

Outdoor vision-and-language navigation (VLN) is such a task where an agent follows natural language instructions and navigates a real-life urban environment.

Ranked #4 on Vision and Language Navigation on Touchdown Dataset (using extra training data)

Style Transfer Text Style Transfer +1

Paper
Code

Online Covariance Matrix Estimation in Stochastic Gradient Descent

no code implementations • 10 Feb 2020 • Wanrong Zhu, Xi Chen, Wei Biao Wu

This approach fits in an online setting and takes full advantage of SGD: efficiency in computation and memory.

valid

Paper
Add Code

Text Infilling

1 code implementation • 1 Jan 2019 • Wanrong Zhu, Zhiting Hu, Eric Xing

Recent years have seen remarkable progress of text generation in different contexts, such as the most common setting of generating text from scratch, and the emerging paradigm of retrieval-and-rewriting.

Retrieval Sentence +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.