Search Results for author: Qiuyuan Huang

Found 24 papers, 11 papers with code

Agent AI: Surveying the Horizons of Multimodal Interaction

1 code implementation7 Jan 2024 Zane Durante, Qiuyuan Huang, Naoki Wake, Ran Gong, Jae Sung Park, Bidipta Sarkar, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Yejin Choi, Katsushi Ikeuchi, Hoi Vo, Li Fei-Fei, Jianfeng Gao

To accelerate research on agent-based multimodal intelligence, we define "Agent AI" as a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data, and can produce meaningful embodied actions.

Localized Symbolic Knowledge Distillation for Visual Commonsense Models

2 code implementations NeurIPS 2023 Jae Sung Park, Jack Hessel, Khyathi Raghavi Chandu, Paul Pu Liang, Ximing Lu, Peter West, Youngjae Yu, Qiuyuan Huang, Jianfeng Gao, Ali Farhadi, Yejin Choi

Empirical results and human evaluations in a zero-shot setup demonstrate that our distillation method results in more precise VL models of reasoning compared to a baseline of passing a generated referring expression to an LLM.

Instruction Following Knowledge Distillation +3

MindAgent: Emergent Gaming Interaction

no code implementations18 Sep 2023 Ran Gong, Qiuyuan Huang, Xiaojian Ma, Hoi Vo, Zane Durante, Yusuke Noda, Zilong Zheng, Song-Chun Zhu, Demetri Terzopoulos, Li Fei-Fei, Jianfeng Gao

Large Language Models (LLMs) have the capacity of performing complex scheduling in a multi-agent system and can coordinate these agents into completing sophisticated tasks that require extensive collaboration.

In-Context Learning Scheduling

ArK: Augmented Reality with Knowledge Interactive Emergent Ability

no code implementations1 May 2023 Qiuyuan Huang, Jae Sung Park, Abhinav Gupta, Paul Bennett, Ran Gong, Subhojit Som, Baolin Peng, Owais Khan Mohammed, Chris Pal, Yejin Choi, Jianfeng Gao

In this study, we develop an infinite agent that learns to transfer knowledge memory from general foundation models (e. g. GPT4, DALLE) to novel domains or scenarios for scene understanding and generation in the physical or virtual world.

Mixed Reality Scene Generation +1

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

no code implementations24 Feb 2023 Baolin Peng, Michel Galley, Pengcheng He, Hao Cheng, Yujia Xie, Yu Hu, Qiuyuan Huang, Lars Liden, Zhou Yu, Weizhu Chen, Jianfeng Gao

Large language models (LLMs), such as ChatGPT, are able to generate human-like, fluent responses for many downstream tasks, e. g., task-oriented dialog and question answering.

Informativeness Open-Domain Question Answering

Training Vision-Language Transformers from Captions

1 code implementation19 May 2022 Liangke Gui, Yingshan Chang, Qiuyuan Huang, Subhojit Som, Alex Hauptmann, Jianfeng Gao, Yonatan Bisk

Vision-Language Transformers can be learned without low-level human labels (e. g. class labels, bounding boxes, etc).

KAT: A Knowledge Augmented Transformer for Vision-and-Language

1 code implementation NAACL 2022 Liangke Gui, Borui Wang, Qiuyuan Huang, Alex Hauptmann, Yonatan Bisk, Jianfeng Gao

The primary focus of recent work with largescale transformers has been on optimizing the amount of information packed into the model's parameters.

Answer Generation Retrieval +1

Mapping Natural-language Problems to Formal-language Solutions Using Structured Neural Representations

2 code implementations ICML 2020 Kezhen Chen, Qiuyuan Huang, Hamid Palangi, Paul Smolensky, Kenneth D. Forbus, Jianfeng Gao

The encoder of TP-N2F employs TPR `binding' to encode natural-language symbolic structure in vector space and the decoder uses TPR `unbinding' to generate, in symbolic space, a sequential program represented by relational tuples, each consisting of a relation (or operation) and a number of arguments.

Program Synthesis Text Generation

Natural- to formal-language generation using Tensor Product Representations

no code implementations25 Sep 2019 Kezhen Chen, Qiuyuan Huang, Hamid Palangi, Paul Smolensky, Kenneth D. Forbus, Jianfeng Gao

Generating formal-language represented by relational tuples, such as Lisp programs or mathematical expressions, from a natural-language input is an extremely challenging task because it requires to explicitly capture discrete symbolic structural information from the input to generate the output.

Math Program Synthesis +1

REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning

1 code implementation IJCNLP 2019 Ming Jiang, Junjie Hu, Qiuyuan Huang, Lei Zhang, Jana Diesner, Jianfeng Gao

In this study, we present a fine-grained evaluation method REO for automatically measuring the performance of image captioning systems.

Image Captioning

Object-driven Text-to-Image Synthesis via Adversarial Training

1 code implementation CVPR 2019 Wenbo Li, Pengchuan Zhang, Lei Zhang, Qiuyuan Huang, Xiaodong He, Siwei Lyu, Jianfeng Gao

In this paper, we propose Object-driven Attentive Generative Adversarial Newtorks (Obj-GANs) that allow object-centered text-to-image synthesis for complex scenes.

Image Generation Object

Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation

no code implementations21 May 2018 Qiuyuan Huang, Zhe Gan, Asli Celikyilmaz, Dapeng Wu, Jian-Feng Wang, Xiaodong He

We propose a hierarchically structured reinforcement learning approach to address the challenges of planning for generating coherent multi-sentence stories for the visual storytelling task.

reinforcement-learning Reinforcement Learning (RL) +2

Turbo Learning for Captionbot and Drawingbot

no code implementations NeurIPS 2018 Qiuyuan Huang, Pengchuan Zhang, Dapeng Wu, Lei Zhang

We study in this paper the problems of both image captioning and text-to-image generation, and present a novel turbo learning approach to jointly training an image-to-text generator (a. k. a.

Image Captioning Text Generation +1

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

1 code implementation3 Apr 2018 Dianqi Li, Qiuyuan Huang, Xiaodong He, Lei Zhang, Ming-Ting Sun

By contrasting with human-written captions and image-mismatched captions, the caption generator effectively exploits the inherent characteristics of human languages, and generates more discriminative captions.

Generative Adversarial Network

Attentive Tensor Product Learning

no code implementations20 Feb 2018 Qiuyuan Huang, Li Deng, Dapeng Wu, Chang Liu, Xiaodong He

This paper proposes a new architecture - Attentive Tensor Product Learning (ATPL) - to represent grammatical structures in deep learning models.

Constituency Parsing Image Captioning +4

Structured Memory based Deep Model to Detect as well as Characterize Novel Inputs

no code implementations30 Jan 2018 Pratik Prabhanjan Brahma, Qiuyuan Huang, Dapeng Wu

While deep learning has pushed the boundaries in various machine learning tasks, the current models are still far away from replicating many functions that a normal human brain can do.

Memorization

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

19 code implementations CVPR 2018 Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He

In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation.

Generative Adversarial Network Image-text matching +2

A Neural-Symbolic Approach to Design of CAPTCHA

no code implementations29 Oct 2017 Qiuyuan Huang, Paul Smolensky, Xiaodong He, Li Deng, Dapeng Wu

To address this, this paper promotes image/visual captioning based CAPTCHAs, which is robust against machine-learning-based attacks.

BIG-bench Machine Learning Image Captioning +1

Tensor Product Generation Networks for Deep NLP Modeling

2 code implementations NAACL 2018 Qiuyuan Huang, Paul Smolensky, Xiaodong He, Li Deng, Dapeng Wu

We present a new approach to the design of deep networks for natural language processing (NLP), based on the general technique of Tensor Product Representations (TPRs) for encoding and processing symbol structures in distributed neural networks.

Caption Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.