Search Results for author: Yitian Yuan

Found 9 papers, 4 papers with code

Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment

no code implementations15 Dec 2023 Xiaoxu Xu, Yitian Yuan, Qiudan Zhang, Wenhui Wu, Zequn Jie, Lin Ma, Xu Wang

During the inference stage, the learned text-3D correspondence will help us ground the text queries to the 3D target objects even without 2D images.

Natural Language Queries Scene Understanding +1

Controllable Video Captioning with an Exemplar Sentence

1 code implementation2 Dec 2021 Yitian Yuan, Lin Ma, Jingwen Wang, Wenwu Zhu

In this paper, we investigate a novel and challenging task, namely controllable video captioning with an exemplar sentence.

Caption Generation Sentence +2

Syntax Customized Video Captioning by Imitating Exemplar Sentences

1 code implementation2 Dec 2021 Yitian Yuan, Lin Ma, Wenwu Zhu

Enhancing the diversity of sentences to describe video contents is an important problem arising in recent video captioning research.

Sentence valid +1

A Survey on Temporal Sentence Grounding in Videos

no code implementations16 Sep 2021 Xiaohan Lan, Yitian Yuan, Xin Wang, Zhi Wang, Wenwu Zhu

In this survey, we give a comprehensive overview for TSGV, which i) summarizes the taxonomy of existing methods, ii) provides a detailed description of the evaluation protocols(i. e., datasets and metrics) to be used in TSGV, and iii) in-depth discusses potential problems of current benchmarking designs and research directions for further investigations.

Benchmarking Sentence +2

A Closer Look at Temporal Sentence Grounding in Videos: Dataset and Metric

no code implementations22 Jan 2021 Yitian Yuan, Xiaohan Lan, Xin Wang, Long Chen, Zhi Wang, Wenwu Zhu

All the results demonstrate that the re-organized dataset splits and new metric can better monitor the progress in TSGV.

Benchmarking Sentence +1

Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos

1 code implementation NeurIPS 2019 Yitian Yuan, Lin Ma, Jingwen Wang, Wei Liu, Wenwu Zhu

Temporal sentence grounding in videos aims to detect and localize one target video segment, which semantically corresponds to a given sentence.

Sentence Temporal Sentence Grounding

Sentence Specified Dynamic Video Thumbnail Generation

1 code implementation12 Aug 2019 Yitian Yuan, Lin Ma, Wenwu Zhu

With the tremendous growth of videos over the Internet, video thumbnails, providing video content previews, are becoming increasingly crucial to influencing users' online searching experiences.

Sentence

To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression

no code implementations19 Apr 2018 Yitian Yuan, Tao Mei, Wenwu Zhu

Then, a multi-modal co-attention mechanism is introduced to generate not only video attention which reflects the global video structure, but also sentence attention which highlights the crucial details for temporal localization.

regression Sentence +1

Cannot find the paper you are looking for? You can Submit a new open access paper.