1 code implementation • 24 Oct 2024 • Zhengqiang Zhang, Ruihuang Li, Lei Zhang
While image generation with diffusion models has achieved a great success, generating images of higher resolution than the training size remains a challenging task due to the high computational cost.
no code implementations • 13 Jul 2024 • Ruihuang Li, Zhengqiang Zhang, Chenhang He, Zhiyuan Ma, Vishal M. Patel, Lei Zhang
Recent vision-language pre-training models have exhibited remarkable generalization ability in zero-shot recognition tasks.
no code implementations • 25 Jun 2024 • Ruihuang Li, Liyi Chen, Zhengqiang Zhang, Varun Jampani, Vishal M. Patel, Lei Zhang
Meanwhile, the 2D diffusion models also exhibit substantial potentials for 3D editing tasks.
1 code implementation • 17 Mar 2024 • Ruibin Li, Ruihuang Li, Song Guo, Lei Zhang
Text-driven diffusion models have significantly advanced the image editing performance by using text prompts as inputs.
1 code implementation • 1 Jan 2024 • Chenhang He, Ruihuang Li, Guowen Zhang, Lei Zhang
In this paper, we introduce ScatterFormer, which to the best of our knowledge, is the first to directly apply attention to voxels across different windows as a single sequence.
1 code implementation • 15 Dec 2023 • Zhengqiang Zhang, Ruihuang Li, Shi Guo, Yang Cao, Lei Zhang
Online video super-resolution (online-VSR) highly relies on an effective alignment module to aggregate temporal information, while the strict latency requirement makes accurate and efficient alignment very challenging.
1 code implementation • CVPR 2023 • Shuai Li, Minghan Li, Ruihuang Li, Chenhang He, Lei Zhang
The positive and negative weights of these soft anchors are dynamically adjusted during training so that they can contribute more to ``representation learning'' in the early training stage, and contribute more to ``duplicated prediction removal'' in the later stage.
1 code implementation • CVPR 2023 • Chenhang He, Ruihuang Li, Yabin Zhang, Shuai Li, Lei Zhang
Current top-performing multi-frame detectors mostly follow a Detect-and-Fuse framework, which extracts features from each frame of the sequence and fuses them to detect the objects in the current frame.
1 code implementation • CVPR 2023 • Ruihuang Li, Chenhang He, Yabin Zhang, Shuai Li, Liyi Chen, Lei Zhang
Weakly supervised instance segmentation using only bounding box annotations has recently attracted much research attention.
1 code implementation • CVPR 2023 • Ruihuang Li, Chenhang He, Shuai Li, Yabin Zhang, Lei Zhang
The representative instance segmentation methods mostly segment different object instances with a mask of the fixed resolution, e. g., 28*28 grid.
no code implementations • 30 Jan 2023 • Yabin Zhang, Bin Deng, Ruihuang Li, Kui Jia, Lei Zhang
By updating the model against the adversarial statistics perturbation during training, we allow the model to explore the worst-case domain and hence improve its generalization performance.
1 code implementation • ICCV 2023 • Liyi Chen, Chenyang Lei, Ruihuang Li, Shuai Li, Zhaoxiang Zhang, Lei Zhang
Without introducing any external supervision and human priors, the proposed FPR effectively suppresses wrong activations from the background objects.
Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation
1 code implementation • 13 Nov 2022 • Yabin Zhang, Jiehong Lin, Ruihuang Li, Kui Jia, Lei Zhang
We also validate the effectiveness of affine transformation corruption with the Transformer backbones, where we decompose the reconstruction of the complete point cloud into the reconstructions of detailed local patches and rough global shape, alleviating the position leakage problem in the reconstruction.
1 code implementation • CVPR 2022 • Takashi Isobe, Xu Jia, Xin Tao, Changlin Li, Ruihuang Li, Yongjie Shi, Jing Mu, Huchuan Lu, Yu-Wing Tai
Instead of directly feeding consecutive frames into a VSR model, we propose to compute the temporal difference between frames and divide those pixels into two subsets according to the level of difference.
1 code implementation • CVPR 2022 • Chenhang He, Ruihuang Li, Shuai Li, Lei Zhang
VoxSeT is built upon a voxel-based set attention (VSA) module, which reduces the self-attention in each voxel by two cross-attentions and models features in a hidden space induced by a group of latent codes.
1 code implementation • CVPR 2022 • Ruihuang Li, Shuai Li, Chenhang He, Yabin Zhang, Xu Jia, Lei Zhang
One popular solution to this challenging task is self-training, which selects high-scoring predictions on target samples as pseudo labels for training.
Ranked #9 on Image-to-Image Translation on SYNTHIA-to-Cityscapes
1 code implementation • CVPR 2022 • Shuai Li, Chenhang He, Ruihuang Li, Lei Zhang
Existing LA methods mostly focus on the design of pos weighting function, while the neg weight is directly derived from the pos weight.
2 code implementations • CVPR 2022 • Yabin Zhang, Minghan Li, Ruihuang Li, Kui Jia, Lei Zhang
In this work, we, for the first time to our best knowledge, propose to perform Exact Feature Distribution Matching (EFDM) by exactly matching the empirical Cumulative Distribution Functions (eCDFs) of image features, which could be implemented by applying the Exact Histogram Matching (EHM) in the image feature space.
Ranked #7 on Style Transfer on StyleBench
1 code implementation • ICCV 2021 • Ruihuang Li, Xu Jia, Jianzhong He, Shuaijun Chen, QinGhua Hu
Most existing domain adaptation methods focus on adaptation from only one source domain, however, in practice there are a number of relevant sources that could be leveraged to help improve performance on target domain.
Ranked #2 on Unsupervised Domain Adaptation on PACS
no code implementations • ICCV 2019 • Ruihuang Li, Changqing Zhang, Huazhu Fu, Xi Peng, Tianyi Zhou, Qinghua Hu
Multi-view clustering is a long-standing important research topic, however, remains challenging when handling high-dimensional data and simultaneously exploring the consistency and complementarity of different views.