Search Results for author: Wentian Zhao

Found 11 papers, 1 papers with code

How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos

no code implementations2 Dec 2018 Shaojie Wang, Wentian Zhao, Ziyi Kou, Chenliang Xu

Furthermore, we study multiple modalities including description and transcripts for the purpose of boosting video understanding.

Logical Reasoning Question Answering +1

GAN-EM: GAN based EM learning framework

no code implementations2 Dec 2018 Wentian Zhao, Shaojie Wang, Zhihuai Xie, Jing Shi, Chenliang Xu

To overcome such limitation, we propose a GAN based EM learning framework that can maximize the likelihood of images and estimate the latent variables with only the constraint of L-Lipschitz continuity.

Clustering Dimensionality Reduction +2

Relational Reasoning using Prior Knowledge for Visual Captioning

no code implementations4 Jun 2019 Jingyi Hou, Xinxiao Wu, Yayun Qi, Wentian Zhao, Jiebo Luo, Yunde Jia

Extensive experiments on the MS-COCO image captioning benchmark and the MSVD video captioning benchmark validate the superiority of our method on leveraging prior commonsense knowledge to enhance relational reasoning for visual captioning.

Image Captioning object-detection +4

Weakly Supervised Localization Using Background Images

no code implementations9 Sep 2019 Ziyi Kou, Wentian Zhao, Guofeng Cui, Shaojie Wang

Weakly Supervised Object Localization (WSOL) methodsusually rely on fully convolutional networks in order to ob-tain class activation maps(CAMs) of targeted labels.

Object Weakly-Supervised Object Localization

Improve CAM with Auto-adapted Segmentation and Co-supervised Augmentation

no code implementations17 Nov 2019 Ziyi Kou, Guofeng Cui, Shaojie Wang, Wentian Zhao, Chenliang Xu

In this paper, we propose a confidence segmentation (ConfSeg) module that builds confidence score for each pixel in CAM without introducing additional hyper-parameters.

Object Weakly-Supervised Object Localization

MemCap: Memorizing Style Knowledge for Image Captioning

1 code implementation AAAI 2020 Wentian Zhao, Xinxiao wu, Xiaoxun Zhang

Generating stylized captions for images is a challenging task since it requires not only describing the content of the image accurately but also expressing the desired linguistic style appropriately.

Image Captioning Language Modelling +1

Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph

no code implementations26 Jul 2021 Wentian Zhao, Yao Hu, HeDa Wang, Xinxiao wu, Jiebo Luo

Entity-aware image captioning aims to describe named entities and events related to the image by utilizing the background knowledge in the associated article.

Graph Attention Image Captioning +1

Multi-modal Dependency Tree for Video Captioning

no code implementations NeurIPS 2021 Wentian Zhao, Xinxiao wu, Jiebo Luo

To this end, we propose a novel video captioning method that generates a sentence by first constructing a multi-modal dependency tree and then traversing the constructed tree, where the syntactic structure and semantic relationship in the sentence are represented by the tree topology.

Caption Generation Dependency Parsing +3

Text2Layer: Layered Image Generation using Latent Diffusion Model

no code implementations19 Jul 2023 Xinyang Zhang, Wentian Zhao, Xin Lu, Jeff Chien

To achieve layered image generation, we train an autoencoder that is able to reconstruct layered images and train diffusion models on the latent representation.

Image Generation Image Segmentation +1

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

no code implementations26 Dec 2023 Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, Xuanmao Li, Xingpeng Sun, Rohan Ashok, Aniruddha Mukherjee, Hao Kang, Xiangrui Kong, Gang Hua, Tianyi Zhang, Bedrich Benes, Aniket Bera

We have witnessed significant progress in deep learning-based 3D vision, ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS).

Novel View Synthesis Representation Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.