Search Results for author: Pai Peng

Found 14 papers, 9 papers with code

Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning

2 code implementations CVPR 2021 Jinpeng Wang, Yuting Gao, Ke Li, Yiqi Lin, Andy J. Ma, Hao Cheng, Pai Peng, Feiyue Huang, Rongrong Ji, Xing Sun

Then we force the model to pull the feature of the distracting video and the feature of the original video closer, so that the model is explicitly restricted to resist the background influence, focusing more on the motion changes.

Representation Learning Self-Supervised Learning

NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination

1 code implementation27 Jul 2020 Penghao Zhou, Chong Zhou, Pai Peng, Junlong Du, Xing Sun, Xiaowei Guo, Feiyue Huang

Greedy-NMS inherently raises a dilemma, where a lower NMS threshold will potentially lead to a lower recall rate and a higher threshold introduces more false positives.

Hallucination Object Detection +1

Global2Local: Efficient Structure Search for Video Action Segmentation

2 code implementations CVPR 2021 Shang-Hua Gao, Qi Han, Zhong-Yu Li, Pai Peng, Liang Wang, Ming-Ming Cheng

Our search scheme exploits both global search to find the coarse combinations and local search to get the refined receptive field combination patterns further.

Action Segmentation Segmentation

FECANet: Boosting Few-Shot Semantic Segmentation with Feature-Enhanced Context-Aware Network

1 code implementation19 Jan 2023 Huafeng Liu, Pai Peng, Tao Chen, Qiong Wang, Yazhou Yao, Xian-Sheng Hua

Few-shot semantic segmentation is the task of learning to locate each pixel of the novel class in the query image with only a few annotated support images.

Few-Shot Semantic Segmentation

Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval with Partial Query

1 code implementation ICCV 2021 Guanyu Cai, Jun Zhang, Xinyang Jiang, Yifei Gong, Lianghua He, Fufu Yu, Pai Peng, Xiaowei Guo, Feiyue Huang, Xing Sun

However, the performance of existing methods suffers in real life since the user is likely to provide an incomplete description of an image, which often leads to results filled with false positives that fit the incomplete description.

Cross-Modal Retrieval Image Retrieval +1

Reprint: a randomized extrapolation based on principal components for data augmentation

1 code implementation26 Apr 2022 Jiale Wei, Qiyuan Chen, Pai Peng, Benjamin Guedj, Le Li

This paper presents REPRINT, a simple and effective hidden-space data augmentation method for imbalanced data classification.

Data Augmentation text-classification +1

Deep Co-Space: Sample Mining Across Feature Transformation for Semi-Supervised Learning

no code implementations28 Jul 2017 Ziliang Chen, Keze Wang, Xiao Wang, Pai Peng, Ebroul Izquierdo, Liang Lin

Aiming at improving performance of visual classification in a cost-effective manner, this paper proposes an incremental semi-supervised learning paradigm called Deep Co-Space (DCS).

Classification General Classification +1

Representative Batch Normalization With Feature Calibration

no code implementations CVPR 2021 Shang-Hua Gao, Qi Han, Duo Li, Ming-Ming Cheng, Pai Peng

We propose to add a simple yet effective feature calibration scheme into the centering and scaling operations of BatchNorm, enhancing the instance-specific representations with the negligible computational cost.

PR-Net: Preference Reasoning for Personalized Video Highlight Detection

no code implementations ICCV 2021 Runnan Chen, Penghao Zhou, Wenzhe Wang, Nenglun Chen, Pai Peng, Xing Sun, Wenping Wang

Personalized video highlight detection aims to shorten a long video to interesting moments according to a user's preference, which has recently raised the community's attention.

Highlight Detection Semantic Similarity +1

An Efficient End-to-End Transformer with Progressive Tri-modal Attention for Multi-modal Emotion Recognition

no code implementations20 Sep 2022 Yang Wu, Pai Peng, Zhenyu Zhang, Yanyan Zhao, Bing Qin

At the low-level, we propose the progressive tri-modal attention, which can model the tri-modal feature interactions by adopting a two-pass strategy and can further leverage such interactions to significantly reduce the computation and memory complexity through reducing the input token length.

Emotion Recognition

Locate before Answering: Answer Guided Question Localization for Video Question Answering

no code implementations5 Oct 2022 Tianwen Qian, Ran Cui, Jingjing Chen, Pai Peng, Xiaowei Guo, Yu-Gang Jiang

Considering the fact that the question often remains concentrated in a short temporal range, we propose to first locate the question to a segment in the video and then infer the answer using the located segment only.

Question Answering Video Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.