Search Results for author: Yuqing Song

Found 13 papers, 6 papers with code

Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization

1 code implementation11 Jun 2021 Ludan Ruan, Jieting Chen, Yuqing Song, ShiZhe Chen, Qin Jin

For the object grounding, we fine-tune the state-of-the-art detection model MDETR and design a post processing method to make the grounding results more faithful.

Caption Generation Object +1

Towards Diverse Paragraph Captioning for Untrimmed Videos

1 code implementation CVPR 2021 Yuqing Song, ShiZhe Chen, Qin Jin

Video paragraph captioning aims to describe multiple events in untrimmed videos with descriptive paragraphs.

Descriptive Event Detection

Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training

1 code implementation25 Aug 2021 Yuqing Song, ShiZhe Chen, Qin Jin, Wei Luo, Jun Xie, Fei Huang

Firstly, there are many specialized jargons in the product description, which are ambiguous to translate without the product image.

Machine Translation Translation

Unifying Event Detection and Captioning as Sequence Generation via Pre-Training

1 code implementation18 Jul 2022 Qi Zhang, Yuqing Song, Qin Jin

Dense video captioning aims to generate corresponding text descriptions for a series of events in the untrimmed video, which can be divided into two sub-tasks, event detection and event captioning.

Dense Video Captioning Event Detection

Accommodating Audio Modality in CLIP for Multimodal Processing

1 code implementation12 Mar 2023 Ludan Ruan, Anwen Hu, Yuqing Song, Liang Zhang, Sipeng Zheng, Qin Jin

In this paper, we extend the stateof-the-art Vision-Language model CLIP to accommodate the audio modality for Vision-Language-Audio multimodal processing.

AudioCaps Contrastive Learning +4

RUC+CMU: System Report for Dense Captioning Events in Videos

no code implementations22 Jun 2018 Shizhe Chen, Yuqing Song, Yida Zhao, Jiarong Qiu, Qin Jin, Alexander Hauptmann

This notebook paper presents our system in the ActivityNet Dense Captioning in Video task (task 3).

Caption Generation Dense Captioning +1

Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos

no code implementations11 Jul 2019 Shizhe Chen, Yuqing Song, Yida Zhao, Qin Jin, Zhaoyang Zeng, Bei Liu, Jianlong Fu, Alexander Hauptmann

The overall system achieves the state-of-the-art performance on the dense-captioning events in video task with 9. 91 METEOR score on the challenge testing set.

Dense Captioning Dense Video Captioning

Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards

no code implementations15 Aug 2019 Yuqing Song, Shi-Zhe Chen, Yida Zhao, Qin Jin

We employ self-supervision from mono-lingual corpus in the target language to provide fluency reward, and propose a multi-level visual semantic matching model to provide both sentence-level and concept-level visual relevancy rewards.

Caption Generation Image Captioning +3

Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019

no code implementations15 Oct 2019 Shizhe Chen, Yida Zhao, Yuqing Song, Qin Jin, Qi Wu

This notebook paper presents our model in the VATEX video captioning challenge.

Video Captioning

Progressive Learning for Image Retrieval with Hybrid-Modality Queries

no code implementations24 Apr 2022 Yida Zhao, Yuqing Song, Qin Jin

Image retrieval with hybrid-modality queries, also known as composing text and image for image retrieval (CTI-IR), is a retrieval task where the search intention is expressed in a more complex query format, involving both vision and text modalities.

Image Retrieval Retrieval +1

Some theoretical results on discrete contour trees

no code implementations24 Jun 2022 Yuqing Song

Contours are defined on a continuous scalar field.

Cannot find the paper you are looking for? You can Submit a new open access paper.