Search Results for author: Yuqing Song

Found 13 papers, 6 papers with code

WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training

2 code implementations • 11 Mar 2021 • Yuqi Huo, Manli Zhang, Guangzhen Liu, Haoyu Lu, Yizhao Gao, Guoxing Yang, Jingyuan Wen, Heng Zhang, Baogui Xu, Weihao Zheng, Zongzheng Xi, Yueqian Yang, Anwen Hu, Jinming Zhao, Ruichen Li, Yida Zhao, Liang Zhang, Yuqing Song, Xin Hong, Wanqing Cui, Danyang Hou, Yingyan Li, Junyi Li, Peiyu Liu, Zheng Gong, Chuhao Jin, Yuchong Sun, ShiZhe Chen, Zhiwu Lu, Zhicheng Dou, Qin Jin, Yanyan Lan, Wayne Xin Zhao, Ruihua Song, Ji-Rong Wen

We further construct a large Chinese multi-source image-text dataset called RUC-CAS-WenLan for pre-training our BriVL model.

Ranked #1 on Image Retrieval on RUC-CAS-WenLan

Contrastive Learning Image Captioning +2

273

Paper
Code

Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization

1 code implementation • 11 Jun 2021 • Ludan Ruan, Jieting Chen, Yuqing Song, ShiZhe Chen, Qin Jin

For the object grounding, we fine-tune the state-of-the-art detection model MDETR and design a post processing method to make the grounding results more faithful.

Caption Generation Object +1

154

Paper
Code

Towards Diverse Paragraph Captioning for Untrimmed Videos

1 code implementation • CVPR 2021 • Yuqing Song, ShiZhe Chen, Qin Jin

Video paragraph captioning aims to describe multiple events in untrimmed videos with descriptive paragraphs.

Descriptive Event Detection

Paper
Code

Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training

1 code implementation • 25 Aug 2021 • Yuqing Song, ShiZhe Chen, Qin Jin, Wei Luo, Jun Xie, Fei Huang

Firstly, there are many specialized jargons in the product description, which are ambiguous to translate without the product image.

Machine Translation Translation

Paper
Code

Unifying Event Detection and Captioning as Sequence Generation via Pre-Training

1 code implementation • 18 Jul 2022 • Qi Zhang, Yuqing Song, Qin Jin

Dense video captioning aims to generate corresponding text descriptions for a series of events in the untrimmed video, which can be divided into two sub-tasks, event detection and event captioning.

Dense Video Captioning Event Detection

Paper
Code

Accommodating Audio Modality in CLIP for Multimodal Processing

1 code implementation • 12 Mar 2023 • Ludan Ruan, Anwen Hu, Yuqing Song, Liang Zhang, Sipeng Zheng, Qin Jin

In this paper, we extend the stateof-the-art Vision-Language model CLIP to accommodate the audio modality for Vision-Language-Audio multimodal processing.

AudioCaps Contrastive Learning +4

Paper
Code

RUC+CMU: System Report for Dense Captioning Events in Videos

no code implementations • 22 Jun 2018 • Shizhe Chen, Yuqing Song, Yida Zhao, Jiarong Qiu, Qin Jin, Alexander Hauptmann

This notebook paper presents our system in the ActivityNet Dense Captioning in Video task (task 3).

Caption Generation Dense Captioning +1

Paper
Add Code

Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos

no code implementations • 11 Jul 2019 • Shizhe Chen, Yuqing Song, Yida Zhao, Qin Jin, Zhaoyang Zeng, Bei Liu, Jianlong Fu, Alexander Hauptmann

The overall system achieves the state-of-the-art performance on the dense-captioning events in video task with 9. 91 METEOR score on the challenge testing set.

Dense Captioning Dense Video Captioning

Paper
Add Code

Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards

no code implementations • 15 Aug 2019 • Yuqing Song, Shi-Zhe Chen, Yida Zhao, Qin Jin

We employ self-supervision from mono-lingual corpus in the target language to provide fluency reward, and propose a multi-level visual semantic matching model to provide both sentence-level and concept-level visual relevancy rewards.

Caption Generation Image Captioning +3

Paper
Add Code

Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019

no code implementations • 15 Oct 2019 • Shizhe Chen, Yida Zhao, Yuqing Song, Qin Jin, Qi Wu

This notebook paper presents our model in the VATEX video captioning challenge.

Video Captioning

Paper
Add Code

Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring Sequential Events Detection for Dense Video Captioning

no code implementations • 14 Jun 2020 • Yuqing Song, Shi-Zhe Chen, Yida Zhao, Qin Jin

Detecting meaningful events in an untrimmed video is essential for dense video captioning.

Ranked #3 on Dense Video Captioning on ActivityNet Captions

Dense Captioning Dense Video Captioning +1

Paper
Add Code

Progressive Learning for Image Retrieval with Hybrid-Modality Queries

no code implementations • 24 Apr 2022 • Yida Zhao, Yuqing Song, Qin Jin

Image retrieval with hybrid-modality queries, also known as composing text and image for image retrieval (CTI-IR), is a retrieval task where the search intention is expressed in a more complex query format, involving both vision and text modalities.

Image Retrieval Retrieval +1

Paper
Add Code

Some theoretical results on discrete contour trees

no code implementations • 24 Jun 2022 • Yuqing Song

Contours are defined on a continuous scalar field.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.