Search Results for author: Zhengyuan Yang

Found 19 papers, 8 papers with code

Scaling Up Vision-Language Pre-training for Image Captioning

no code implementations24 Nov 2021 Xiaowei Hu, Zhe Gan, JianFeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, Lijuan Wang

In this paper, we present LEMON, a LargE-scale iMage captiONer, and provide the first empirical study on the scaling behavior of VLP for image captioning.

Image Captioning

Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling

no code implementations23 Nov 2021 Zhengyuan Yang, Zhe Gan, JianFeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang

In this paper, we propose UNICORN, a vision-language (VL) model that unifies text generation and bounding box prediction into a single architecture.

Image Captioning Language Modelling +5

UFO: A UniFied TransfOrmer for Vision-Language Representation Learning

no code implementations19 Nov 2021 JianFeng Wang, Xiaowei Hu, Zhe Gan, Zhengyuan Yang, Xiyang Dai, Zicheng Liu, Yumao Lu, Lijuan Wang

In this paper, we propose a single UniFied transfOrmer (UFO), which is capable of processing either unimodal inputs (e. g., image or language) or multimodal inputs (e. g., the concatenation of the image and the question), for vision-language (VL) representation learning.

Image Captioning Language Modelling +5

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA

no code implementations10 Sep 2021 Zhengyuan Yang, Zhe Gan, JianFeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, Lijuan Wang

To address this challenge, we propose PICa, a simple yet effective method that Prompts GPT3 via the use of Image Captions, for knowledge-based VQA.

Image Captioning Question Answering +1

SAT: 2D Semantics Assisted Training for 3D Visual Grounding

1 code implementation ICCV 2021 Zhengyuan Yang, Songyang Zhang, LiWei Wang, Jiebo Luo

3D visual grounding aims at grounding a natural language description about a 3D scene, usually represented in the form of 3D point clouds, to the targeted object region.

Representation Learning Visual Grounding

TransVG: End-to-End Visual Grounding with Transformers

2 code implementations ICCV 2021 Jiajun Deng, Zhengyuan Yang, Tianlang Chen, Wengang Zhou, Houqiang Li

In this paper, we present a neat yet effective transformer-based framework for visual grounding, namely TransVG, to address the task of grounding a language query to the corresponding region onto an image.

Referring Expression Comprehension Visual Grounding

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption

no code implementations CVPR 2021 Zhengyuan Yang, Yijuan Lu, JianFeng Wang, Xi Yin, Dinei Florencio, Lijuan Wang, Cha Zhang, Lei Zhang, Jiebo Luo

Due to this aligned representation learning, even pre-trained on the same downstream task dataset, TAP already boosts the absolute accuracy on the TextVQA dataset by +5. 4%, compared with a non-TAP baseline.

Language Modelling Optical Character Recognition +4

Pose-based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation

no code implementations30 Oct 2020 Zhengyuan Yang, Amanda Kay, Yuncheng Li, Wendi Cross, Jiebo Luo

We then evaluate the framework on a proposed URMC dataset, which consists of conversations between a standardized patient and a behavioral health professional, along with expert annotations of body language, emotions, and potential psychiatric symptoms.

Action Recognition Emotion Recognition

Dynamic Context-guided Capsule Network for Multimodal Machine Translation

1 code implementation4 Sep 2020 Huan Lin, Fandong Meng, Jinsong Su, Yongjing Yin, Zhengyuan Yang, Yubin Ge, Jie zhou, Jiebo Luo

Particularly, we represent the input image with global and regional visual features, we introduce two parallel DCCNs to model multimodal context vectors with visual features at different granularities.

Multimodal Machine Translation Representation Learning +1

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

no code implementations CVPR 2021 Liwei Wang, Jing Huang, Yin Li, Kun Xu, Zhengyuan Yang, Dong Yu

Our core innovation is the learning of a region-phrase score function, based on which an image-sentence score function is further constructed.

Contrastive Learning Knowledge Distillation +3


no code implementations13 Dec 2019 Zhengyuan Yang, Tushar Kumar, Tianlang Chen, Jinsong Su, Jiebo Luo

In this paper, we study Tracking by Language that localizes the target box sequence in a video based on a language query.

Weakly Supervised Body Part Segmentation with Pose based Part Priors

no code implementations30 Jul 2019 Zhengyuan Yang, Yuncheng Li, Linjie Yang, Ning Zhang, Jiebo Luo

The core idea is first converting the sparse weak labels such as keypoints to the initial estimate of body part masks, and then iteratively refine the part mask predictions.

Face Parsing Semantic Segmentation

Human-Centered Emotion Recognition in Animated GIFs

1 code implementation27 Apr 2019 Zhengyuan Yang, Yixuan Zhang, Jiebo Luo

The framework consists of a facial attention module and a hierarchical segment temporal module.

Emotion Recognition

Attentive Relational Networks for Mapping Images to Scene Graphs

no code implementations CVPR 2019 Mengshi Qi, Weijian Li, Zhengyuan Yang, Yunhong Wang, Jiebo Luo

Scene graph generation refers to the task of automatically mapping an image into a semantic structural graph, which requires correctly labeling each extracted object and their interaction relationships.

Graph Generation Object Detection +1

Action Recognition with Spatio-Temporal Visual Attention on Skeleton Image Sequences

no code implementations31 Jan 2018 Zhengyuan Yang, Yuncheng Li, Jianchao Yang, Jiebo Luo

The attention mechanism is important for skeleton based action recognition because there exist spatio-temporal key stages while the joint predictions can be inaccurate.

Action Recognition Skeleton Based Action Recognition

End-to-end Multi-Modal Multi-Task Vehicle Control for Self-Driving Cars with Visual Perception

1 code implementation20 Jan 2018 Zhengyuan Yang, Yixuan Zhang, Jerry Yu, Junjie Cai, Jiebo Luo

In this work, we propose a multi-task learning framework to predict the steering angle and speed control simultaneously in an end-to-end manner.

Autonomous Driving Multi-Task Learning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.