Search Results for author: Sucheng Ren

Found 25 papers, 15 papers with code

Compress & Align: Curating Image-Text Data with Human Knowledge

no code implementations11 Dec 2023 Lei Zhang, Fangxun Shu, Sucheng Ren, Bingchen Zhao, Hao Jiang, Cihang Xie

The massive growth of image-text data through web crawling inherently presents the challenge of variability in data quality.

Image Captioning Text Retrieval

Rejuvenating image-GPT as Strong Visual Representation Learners

1 code implementation4 Dec 2023 Sucheng Ren, Zeyu Wang, Hongru Zhu, Junfei Xiao, Alan Yuille, Cihang Xie

This paper enhances image-GPT (iGPT), one of the pioneering works that introduce autoregressive pretraining to predict next pixels for visual representation learning.

Representation Learning

NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos

1 code implementation23 Aug 2023 Ziyu Yang, Sucheng Ren, Zongwei Wu, Nanxuan Zhao, Junle Wang, Jing Qin, Shengfeng He

Non-photorealistic videos are in demand with the wave of the metaverse, but lack of sufficient research studies.

Saliency Detection

SG-Former: Self-guided Transformer with Evolving Token Reallocation

1 code implementation ICCV 2023 Sucheng Ren, Xingyi Yang, Songhua Liu, Xinchao Wang

At the heart of our approach is to utilize a significance map, which is estimated through hybrid-scale self-attention and evolves itself during training, to reallocate tokens based on the significance of each region.

DeepMIM: Deep Supervision for Masked Image Modeling

1 code implementation15 Mar 2023 Sucheng Ren, Fangyun Wei, Samuel Albanie, Zheng Zhang, Han Hu

Deep supervision, which involves extra supervisions to the intermediate features of a neural network, was widely used in image classification in the early deep learning era since it significantly reduces the training difficulty and eases the optimization like avoiding gradient vanish over the vanilla training.

Image Classification object-detection +2

TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models

2 code implementations CVPR 2023 Sucheng Ren, Fangyun Wei, Zheng Zhang, Han Hu

Our TinyMIM model of tiny size achieves 79. 6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget.

Image Classification Semantic Segmentation

DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation

1 code implementation13 Jul 2022 Songhua Liu, Jingwen Ye, Sucheng Ren, Xinchao Wang

Prior approaches, despite the promising results, have relied on either estimating dense attention to compute per-point matching, which is limited to only coarse scales due to the quadratic memory cost, or fixing the number of correspondences to achieve linear complexity, which lacks flexibility.

Face Generation Style Transfer

A Simple Data Mixing Prior for Improving Self-Supervised Learning

1 code implementation CVPR 2022 Sucheng Ren, Huiyu Wang, Zhengqi Gao, Shengfeng He, Alan Yuille, Yuyin Zhou, Cihang Xie

More notably, our SDMP is the first method that successfully leverages data mixing to improve (rather than hurt) the performance of Vision Transformers in the self-supervised setting.

Representation Learning Self-Supervised Learning

The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation

2 code implementations13 Jun 2022 Zihui Xue, Zhengqi Gao, Sucheng Ren, Hang Zhao

Crossmodal knowledge distillation (KD) extends traditional knowledge distillation to the area of multimodal learning and demonstrates great success in various applications.

Knowledge Distillation Transfer Learning

Glance to Count: Learning to Rank with Anchors for Weakly-supervised Crowd Counting

no code implementations29 May 2022 Zheng Xiong, Liangyu Chai, Wenxi Liu, Yongtuo Liu, Sucheng Ren, Shengfeng He

To enable training under this new setting, we convert the crowd count regression problem to a ranking potential prediction problem.

Crowd Counting Learning-To-Rank

Training-Free Robust Multimodal Learning via Sample-Wise Jacobian Regularization

no code implementations5 Apr 2022 Zhengqi Gao, Sucheng Ren, Zihui Xue, Siting Li, Hang Zhao

Multimodal fusion emerges as an appealing technique to improve model performances on many tasks.

Self-supervision through Random Segments with Autoregressive Coding (RandSAC)

no code implementations22 Mar 2022 Tianyu Hua, Yonglong Tian, Sucheng Ren, Michalis Raptis, Hang Zhao, Leonid Sigal

We illustrate that randomized serialization of the segments significantly improves the performance and results in distribution over spatially-long (across-segments) and -short (within-segment) predictions which are effective for feature learning.

Representation Learning Self-Supervised Learning

Shunted Self-Attention via Multi-Scale Token Aggregation

1 code implementation CVPR 2022 Sucheng Ren, Daquan Zhou, Shengfeng He, Jiashi Feng, Xinchao Wang

This novel merging scheme enables the self-attention to learn relationships between objects with different sizes and simultaneously reduces the token numbers and the computational cost.

Reducing Spatial Labeling Redundancy for Semi-supervised Crowd Counting

no code implementations6 Aug 2021 Yongtuo Liu, Sucheng Ren, Liangyu Chai, Hanjie Wu, Jing Qin, Dan Xu, Shengfeng He

In this way, we can transfer the original spatial labeling redundancy caused by individual similarities to effective supervision signals on the unlabeled regions.

Crowd Counting

Fine-grained Domain Adaptive Crowd Counting via Point-derived Segmentation

no code implementations6 Aug 2021 Yongtuo Liu, Dan Xu, Sucheng Ren, Hanjie Wu, Hongmin Cai, Shengfeng He

To this end, we propose to untangle \emph{domain-invariant} crowd and \emph{domain-specific} background from crowd images and design a fine-grained domain adaption method for crowd counting.

Crowd Counting Domain Adaptation +1

Unifying Global-Local Representations in Salient Object Detection with Transformer

1 code implementation5 Aug 2021 Sucheng Ren, Qiang Wen, Nanxuan Zhao, Guoqiang Han, Shengfeng He

In this paper, we introduce a new attention-based encoder, vision transformer, into salient object detection to ensure the globalization of the representations from shallow to deep layers.

object-detection Object Detection +1

Co-advise: Cross Inductive Bias Distillation

no code implementations CVPR 2022 Sucheng Ren, Zhengqi Gao, Tianyu Hua, Zihui Xue, Yonglong Tian, Shengfeng He, Hang Zhao

Transformers recently are adapted from the community of natural language processing as a promising substitute of convolution-based neural networks for visual learning tasks.

Inductive Bias

Reciprocal Transformations for Unsupervised Video Object Segmentation

1 code implementation CVPR 2021 Sucheng Ren, Wenxi Liu, Yongtuo Liu, Haoxin Chen, Guoqiang Han, Shengfeng He

Additionally, to exclude the information of the moving background objects from motion features, our transformation module enables to reciprocally transform the appearance features to enhance the motion features, so as to focus on the moving objects with salient appearance while removing the co-moving outliers.

Object Optical Flow Estimation +3

Learning From the Master: Distilling Cross-Modal Advanced Knowledge for Lip Reading

no code implementations CVPR 2021 Sucheng Ren, Yong Du, Jianming Lv, Guoqiang Han, Shengfeng He

To these ends, we introduce a trainable "master" network which ingests both audio signals and silent lip videos instead of a pretrained teacher.

Lip Reading Sentence +2

On Feature Decorrelation in Self-Supervised Learning

1 code implementation ICCV 2021 Tianyu Hua, Wenxiao Wang, Zihui Xue, Sucheng Ren, Yue Wang, Hang Zhao

In self-supervised representation learning, a common idea behind most of the state-of-the-art approaches is to enforce the robustness of the representations to predefined augmentations.

Representation Learning Self-Supervised Learning

Multimodal Knowledge Expansion

1 code implementation ICCV 2021 Zihui Xue, Sucheng Ren, Zhengqi Gao, Hang Zhao

The popularity of multimodal sensors and the accessibility of the Internet have brought us a massive amount of unlabeled multimodal data.

Denoising Knowledge Distillation +1

TENet: Triple Excitation Network for Video Salient Object Detection

no code implementations ECCV 2020 Sucheng Ren, Chu Han, Xin Yang, Guoqiang Han, Shengfeng He

In this paper, we propose a simple yet effective approach, named Triple Excitation Network, to reinforce the training of video salient object detection (VSOD) from three aspects, spatial, temporal, and online excitations.

Object object-detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.