Search Results for author: Fengyun Rao

Found 10 papers, 4 papers with code

Spatial-Semantic Collaborative Cropping for User Generated Content

1 code implementation16 Jan 2024 Yukun Su, Yiwen Cao, Jingliang Deng, Fengyun Rao, Qingyao Wu

A large amount of User Generated Content (UGC) is uploaded to the Internet daily and displayed to people world-widely through the client side (e. g., mobile and PC).

Image Cropping

Inter-X: Towards Versatile Human-Human Interaction Analysis

no code implementations26 Dec 2023 Liang Xu, Xintao Lv, Yichao Yan, Xin Jin, Shuwen Wu, Congsheng Xu, Yifan Liu, Yizhou Zhou, Fengyun Rao, Xingdong Sheng, Yunhui Liu, Wenjun Zeng, Xiaokang Yang

We also equip Inter-X with versatile annotations of more than 34K fine-grained human part-level textual descriptions, semantic interaction categories, interaction order, and the relationship and personality of the subjects.

Image Captioning with Multi-Context Synthetic Data

no code implementations29 May 2023 Feipeng Ma, Yizhou Zhou, Fengyun Rao, Yueyi Zhang, Xiaoyan Sun

This potential can be harnessed to create synthetic image-text pairs for training captioning models.

Image Captioning Language Modelling +2

A Dual-level Detection Method for Video Copy Detection

1 code implementation21 May 2023 Tianyi Wang, Feipeng Ma, Zhenhua Liu, Fengyun Rao

With the development of multimedia technology, Video Copy Detection has been a crucial problem for social media platforms.

Copy Detection Partial Video Copy Detection +2

CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation

1 code implementation9 Dec 2021 Lu Qi, Jason Kuen, Zhe Lin, Jiuxiang Gu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen, Ming-Hsuan Yang, Jiaya Jia

To improve instance-level detection/segmentation performance, existing self-supervised and semi-supervised methods extract either task-unrelated or task-specific training signals from unlabeled data.

object-detection Object Detection +2

CLIP4Caption ++: Multi-CLIP for Video Caption

no code implementations11 Oct 2021 Mingkang Tang, Zhanyu Wang, Zhaoyang Zeng, Fengyun Rao, Dian Li

We make the following improvements on the proposed CLIP4Caption++: We employ an advanced encoder-decoder model architecture X-Transformer as our main framework and make the following improvements: 1) we utilize three strong pre-trained CLIP models to extract the text-related appearance visual features.

Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.