Search Results for author: Sucheng Ren

Found 32 papers, 20 papers with code

M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation

1 code implementation15 Nov 2024 Sucheng Ren, Yaodong Yu, Nataniel Ruiz, Feng Wang, Alan Yuille, Cihang Xie

In this paper, we show that this scale-wise autoregressive framework can be effectively decoupled into \textit{intra-scale modeling}, which captures local spatial dependencies within each scale, and \textit{inter-scale modeling}, which models cross-scale relationships progressively from coarse-to-fine scales.

Image Generation Mamba

Causal Image Modeling for Efficient Visual Understanding

1 code implementation10 Oct 2024 Feng Wang, Timing Yang, Yaodong Yu, Sucheng Ren, Guoyizhe Wei, Angtian Wang, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie

In this work, we present a comprehensive analysis of causal image modeling and introduce the Adventurer series models where we treat images as sequences of patch tokens and employ uni-directional language models to learn visual representations.

Causal Inference

Autoregressive Pretraining with Mamba in Vision

1 code implementation11 Jun 2024 Sucheng Ren, Xianhang Li, Haoqin Tu, Feng Wang, Fangxun Shu, Lei Zhang, Jieru Mei, Linjie Yang, Peng Wang, Heng Wang, Alan Yuille, Cihang Xie

The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks.

Mamba

Medical Vision Generalist: Unifying Medical Imaging Tasks in Context

1 code implementation8 Jun 2024 Sucheng Ren, Xiaoke Huang, Xianhang Li, Junfei Xiao, Jieru Mei, Zeyu Wang, Alan Yuille, Yuyin Zhou

This study presents Medical Vision Generalist (MVG), the first foundation model capable of handling various medical imaging tasks -- such as cross-modal synthesis, image segmentation, denoising, and inpainting -- within a unified image-to-image generation framework.

Conditional Image Generation Denoising +2

ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning

no code implementations24 May 2024 Sucheng Ren, Hongru Zhu, Chen Wei, Yijiang Li, Alan Yuille, Cihang Xie

This paper presents a new self-supervised video representation learning framework, ARVideo, which autoregressively predicts the next video token in a tailored sequence order.

Representation Learning

Mamba-R: Vision Mamba ALSO Needs Registers

1 code implementation23 May 2024 Feng Wang, Jiahao Wang, Sucheng Ren, Guoyizhe Wei, Jieru Mei, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie

Similar to Vision Transformers, this paper identifies artifacts also present within the feature maps of Vision Mamba.

Mamba Semantic Segmentation

Filter & Align: Leveraging Human Knowledge to Curate Image-Text Data

no code implementations11 Dec 2023 Lei Zhang, Fangxun Shu, Tianyang Liu, Sucheng Ren, Hao Jiang, Cihang Xie

However, the vast scale of these datasets inevitably introduces significant variability in data quality, which can adversely affect the model performance.

Image Captioning Image-text Retrieval +1

Rejuvenating image-GPT as Strong Visual Representation Learners

4 code implementations4 Dec 2023 Sucheng Ren, Zeyu Wang, Hongru Zhu, Junfei Xiao, Alan Yuille, Cihang Xie

This paper enhances image-GPT (iGPT), one of the pioneering works that introduce autoregressive pretraining to predict the next pixels for visual representation learning.

Representation Learning

NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos

1 code implementation23 Aug 2023 Ziyu Yang, Sucheng Ren, Zongwei Wu, Nanxuan Zhao, Junle Wang, Jing Qin, Shengfeng He

Non-photorealistic videos are in demand with the wave of the metaverse, but lack of sufficient research studies.

Saliency Detection

SG-Former: Self-guided Transformer with Evolving Token Reallocation

1 code implementation ICCV 2023 Sucheng Ren, Xingyi Yang, Songhua Liu, Xinchao Wang

At the heart of our approach is to utilize a significance map, which is estimated through hybrid-scale self-attention and evolves itself during training, to reallocate tokens based on the significance of each region.

DeepMIM: Deep Supervision for Masked Image Modeling

1 code implementation15 Mar 2023 Sucheng Ren, Fangyun Wei, Samuel Albanie, Zheng Zhang, Han Hu

Deep supervision, which involves extra supervisions to the intermediate features of a neural network, was widely used in image classification in the early deep learning era since it significantly reduces the training difficulty and eases the optimization like avoiding gradient vanish over the vanilla training.

Image Classification object-detection +2

TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models

2 code implementations CVPR 2023 Sucheng Ren, Fangyun Wei, Zheng Zhang, Han Hu

Our TinyMIM model of tiny size achieves 79. 6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget.

Image Classification Semantic Segmentation

DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation

1 code implementation13 Jul 2022 Songhua Liu, Jingwen Ye, Sucheng Ren, Xinchao Wang

Prior approaches, despite the promising results, have relied on either estimating dense attention to compute per-point matching, which is limited to only coarse scales due to the quadratic memory cost, or fixing the number of correspondences to achieve linear complexity, which lacks flexibility.

Face Generation Style Transfer

A Simple Data Mixing Prior for Improving Self-Supervised Learning

1 code implementation CVPR 2022 Sucheng Ren, Huiyu Wang, Zhengqi Gao, Shengfeng He, Alan Yuille, Yuyin Zhou, Cihang Xie

More notably, our SDMP is the first method that successfully leverages data mixing to improve (rather than hurt) the performance of Vision Transformers in the self-supervised setting.

Representation Learning Self-Supervised Learning

The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation

1 code implementation13 Jun 2022 Zihui Xue, Zhengqi Gao, Sucheng Ren, Hang Zhao

Crossmodal knowledge distillation (KD) extends traditional knowledge distillation to the area of multimodal learning and demonstrates great success in various applications.

Knowledge Distillation Transfer Learning

Glance to Count: Learning to Rank with Anchors for Weakly-supervised Crowd Counting

no code implementations29 May 2022 Zheng Xiong, Liangyu Chai, Wenxi Liu, Yongtuo Liu, Sucheng Ren, Shengfeng He

To enable training under this new setting, we convert the crowd count regression problem to a ranking potential prediction problem.

Crowd Counting Learning-To-Rank

Training-Free Robust Multimodal Learning via Sample-Wise Jacobian Regularization

no code implementations5 Apr 2022 Zhengqi Gao, Sucheng Ren, Zihui Xue, Siting Li, Hang Zhao

Multimodal fusion emerges as an appealing technique to improve model performances on many tasks.

Self-supervision through Random Segments with Autoregressive Coding (RandSAC)

no code implementations22 Mar 2022 Tianyu Hua, Yonglong Tian, Sucheng Ren, Michalis Raptis, Hang Zhao, Leonid Sigal

We illustrate that randomized serialization of the segments significantly improves the performance and results in distribution over spatially-long (across-segments) and -short (within-segment) predictions which are effective for feature learning.

Decoder Representation Learning +1

Shunted Self-Attention via Multi-Scale Token Aggregation

1 code implementation CVPR 2022 Sucheng Ren, Daquan Zhou, Shengfeng He, Jiashi Feng, Xinchao Wang

This novel merging scheme enables the self-attention to learn relationships between objects with different sizes and simultaneously reduces the token numbers and the computational cost.

Fine-grained Domain Adaptive Crowd Counting via Point-derived Segmentation

no code implementations6 Aug 2021 Yongtuo Liu, Dan Xu, Sucheng Ren, Hanjie Wu, Hongmin Cai, Shengfeng He

To this end, we propose to untangle \emph{domain-invariant} crowd and \emph{domain-specific} background from crowd images and design a fine-grained domain adaption method for crowd counting.

Crowd Counting Domain Adaptation +1

Reducing Spatial Labeling Redundancy for Semi-supervised Crowd Counting

no code implementations6 Aug 2021 Yongtuo Liu, Sucheng Ren, Liangyu Chai, Hanjie Wu, Jing Qin, Dan Xu, Shengfeng He

In this way, we can transfer the original spatial labeling redundancy caused by individual similarities to effective supervision signals on the unlabeled regions.

Crowd Counting

Unifying Global-Local Representations in Salient Object Detection with Transformer

1 code implementation5 Aug 2021 Sucheng Ren, Qiang Wen, Nanxuan Zhao, Guoqiang Han, Shengfeng He

In this paper, we introduce a new attention-based encoder, vision transformer, into salient object detection to ensure the globalization of the representations from shallow to deep layers.

Decoder object-detection +2

Co-advise: Cross Inductive Bias Distillation

no code implementations CVPR 2022 Sucheng Ren, Zhengqi Gao, Tianyu Hua, Zihui Xue, Yonglong Tian, Shengfeng He, Hang Zhao

Transformers recently are adapted from the community of natural language processing as a promising substitute of convolution-based neural networks for visual learning tasks.

Inductive Bias

Reciprocal Transformations for Unsupervised Video Object Segmentation

1 code implementation CVPR 2021 Sucheng Ren, Wenxi Liu, Yongtuo Liu, Haoxin Chen, Guoqiang Han, Shengfeng He

Additionally, to exclude the information of the moving background objects from motion features, our transformation module enables to reciprocally transform the appearance features to enhance the motion features, so as to focus on the moving objects with salient appearance while removing the co-moving outliers.

Object Optical Flow Estimation +3

Learning From the Master: Distilling Cross-Modal Advanced Knowledge for Lip Reading

no code implementations CVPR 2021 Sucheng Ren, Yong Du, Jianming Lv, Guoqiang Han, Shengfeng He

To these ends, we introduce a trainable "master" network which ingests both audio signals and silent lip videos instead of a pretrained teacher.

Lip Reading Sentence +2

On Feature Decorrelation in Self-Supervised Learning

1 code implementation ICCV 2021 Tianyu Hua, Wenxiao Wang, Zihui Xue, Sucheng Ren, Yue Wang, Hang Zhao

In self-supervised representation learning, a common idea behind most of the state-of-the-art approaches is to enforce the robustness of the representations to predefined augmentations.

Representation Learning Self-Supervised Learning

Multimodal Knowledge Expansion

1 code implementation ICCV 2021 Zihui Xue, Sucheng Ren, Zhengqi Gao, Hang Zhao

The popularity of multimodal sensors and the accessibility of the Internet have brought us a massive amount of unlabeled multimodal data.

Denoising Knowledge Distillation +1

TENet: Triple Excitation Network for Video Salient Object Detection

no code implementations ECCV 2020 Sucheng Ren, Chu Han, Xin Yang, Guoqiang Han, Shengfeng He

In this paper, we propose a simple yet effective approach, named Triple Excitation Network, to reinforce the training of video salient object detection (VSOD) from three aspects, spatial, temporal, and online excitations.

Object object-detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.