no code implementations • 5 Jul 2025 • Hanzhe Liang, Jie Zhang, Tao Dai, Linlin Shen, Jinbao Wang, Can Gao
In this study, a Down-Up Sampling Network (DUS-Net) is proposed to reconstruct high-precision point clouds for 3D anomaly detection by preserving the group center geometric structure.
Ranked #1 on
3D Anomaly Detection
on Real 3D-AD
no code implementations • 24 May 2025 • Guanghao Meng, Sunan He, Jinpeng Wang, Tao Dai, Letian Zhang, Jieming Zhu, Qing Li, Gang Wang, Rui Zhang, Yong Jiang
To address this problem, we propose the Entity Visual Description enhanced CLIP (EvdCLIP), designed to leverage the visual knowledge of entities to enrich queries.
1 code implementation • 22 May 2025 • Naiqi Li, Peiyuan Liu, Zheng Liu, Tao Dai, Yong Jiang, Shu-Tao Xia
This hybrid approach combines the natural language understanding of LLMs with the precise reasoning capabilities of logic programs.
1 code implementation • 21 May 2025 • Naiqi Li, Yuqiu Xie, Peiyuan Liu, Tao Dai, Yong Jiang, Shu-Tao Xia
To overcome this difficulty, various relaxations of the rank function were studied.
no code implementations • 4 May 2025 • Jiayi Cheng, Can Gao, Jie zhou, Jiajun Wen, Tao Dai, Jinbao Wang
Therefore, this paper presents a novel unified model for Multi-Category 3D Anomaly Detection (MC3D-AD) that aims to utilize both local and global geometry-aware information to reconstruct normal representations of all categories.
Ranked #3 on
3D Anomaly Detection
on Anomaly-ShapeNet
1 code implementation • CVPR 2025 • Jinpeng Wang, Tianci Luo, Yaohua Zha, Yan Feng, Ruisheng Luo, Bin Chen, Tao Dai, Long Chen, YaoWei Wang, Shu-Tao Xia
Visual In-Context Learning (VICL) enables adaptively solving vision tasks by leveraging pixel demonstrations, mimicking human-like task completion through analogy.
1 code implementation • 30 Mar 2025 • Hang Guo, Yawei Li, Taolin Zhang, Jiangshan Wang, Tao Dai, Shu-Tao Xia, Luca Benini
We further extend FastVAR to zero-shot generation of higher resolution images.
1 code implementation • CVPR 2025 • Haitong Liu, Kuofeng Gao, Yang Bai, Jinmin Li, Jinxiao Shan, Tao Dai, Shu-Tao Xia
Extensive experiments demonstrate that our video watermarking methods effectively protect video data by significantly reducing video annotation performance across various video-based LLMs, showcasing both stealthiness and robustness in protecting personal video content.
1 code implementation • 26 Feb 2025 • Yifan Hu, Yuante Li, Peiyuan Liu, Yuxia Zhu, Naiqi Li, Tao Dai, Shu-Tao Xia, Dawei Cheng, Changjun Jiang
Addressing these limitations, we propose FinTSB, a comprehensive and practical benchmark for financial time series forecasting (FinTSF).
1 code implementation • 22 Jan 2025 • Yifan Hu, Guibin Zhang, Peiyuan Liu, Disen Lan, Naiqi Li, Dawei Cheng, Tao Dai, Shu-Tao Xia, Shirui Pan
Time series forecasting methods generally fall into two main categories: Channel Independent (CI) and Channel Dependent (CD) strategies.
no code implementations • CVPR 2025 • Yaohua Zha, Yanzi Wang, Hang Guo, Jinpeng Wang, Tao Dai, Bin Chen, Zhihao Ouyang, Xue Yuerong, Ke Chen, Shu-Tao Xia
To overcome this limitation, we propose an orthogonal solution: Point Mamba Adapter (PMA), which constructs an ordered feature sequence from all layers of the pre-trained model and leverages Mamba to fuse all complementary semantics, thereby promoting comprehensive point cloud understanding.
no code implementations • CVPR 2025 • Baixuan Lv, Yaohua Zha, Tao Dai, Xue Yuerong, Ke Chen, Shu-Tao Xia
Point cloud video understanding is becoming increasingly important in fields such as robotics, autonomous driving, and augmented reality, as they can accurately represent object motion and environmental changes.
1 code implementation • 21 Dec 2024 • Jiarui Yang, Tao Dai, Yufei Zhu, Naiqi Li, Jinmin Li, Shutao Xia
In this paper, we propose a masking strategy with strong and weak constraints and iterative refinement for real-world FSR, termed Diffusion Prior Interpolation (DPI).
1 code implementation • CVPR 2025 • Hang Guo, Yong Guo, Yaohua Zha, Yulun Zhang, Wenbo Li, Tao Dai, Shu-Tao Xia, Yawei Li
The Mamba-based image restoration backbones have recently demonstrated significant potential in balancing global reception and computational efficiency.
1 code implementation • 29 Oct 2024 • Hang Guo, Yawei Li, Tao Dai, Shu-Tao Xia, Luca Benini
Fine-tuning pre-trained diffusion models under limited budgets has gained great success.
1 code implementation • 20 Oct 2024 • Taolin Zhang, Jinpeng Wang, Hang Guo, Tao Dai, Bin Chen, Shu-Tao Xia
The historical samples are filtered from the testing data stream and serve to extract useful information from the target distribution, while the boosting samples are drawn from regional bootstrapping and capture the knowledge of the test sample itself.
no code implementations • 13 Oct 2024 • Yaohua Zha, Tao Dai, Yanzi Wang, Hang Guo, Taolin Zhang, Zhihao Ouyang, Chunlin Fan, Bin Chen, Ke Chen, Shu-Tao Xia
We first propose a hybrid-domain masked autoencoder consisting of an encoder and decoder belonging to the scene domain and object domain, respectively.
no code implementations • 12 Oct 2024 • Taolin Zhang, Junwei Pan, Jinpeng Wang, Yaohua Zha, Tao Dai, Bin Chen, Ruisheng Luo, Xiaoxiang Deng, YuAn Wang, Ming Yue, Jie Jiang, Shu-Tao Xia
With recent advances in large language models (LLMs), there has been emerging numbers of research in developing Semantic IDs based on LLMs to enhance the performance of recommendation systems.
1 code implementation • 8 Oct 2024 • Hang Guo, Tao Dai, Zhihao Ouyang, Taolin Zhang, Yaohua Zha, Bin Chen, Shu-Tao Xia
In this paper, we propose an orthogonal solution called the Retrieval-augmented Framework for Image Restoration (ReFIR), which incorporates retrieved images as external knowledge to extend the knowledge boundary of existing LRMs in generating details faithful to the original scene.
1 code implementation • 6 Oct 2024 • Peiyuan Liu, Beiliang Wu, Yifan Hu, Naiqi Li, Tao Dai, Jigang Bao, Shu-Tao Xia
Eliminating non-stationarity is essential for avoiding spurious regressions and capturing local dependencies in short-term modeling, while preserving it is crucial for revealing long-term cointegration across variates.
no code implementations • 20 Aug 2024 • Longfei Lu, Huachen Gao, Tao Dai, Yaohua Zha, Zhi Hou, Junta Wu, Shu-Tao Xia
Recently, image-to-3D approaches have significantly advanced the generation quality and speed of 3D assets based on large reconstruction models, particularly 3D Gaussian reconstruction models.
no code implementations • 8 Aug 2024 • Hongze Zhu, Guoyang Xie, Chengbin Hou, Tao Dai, Can Gao, Jinbao Wang, Linlin Shen
High-resolution point clouds~(HRPCD) anomaly detection~(AD) plays a critical role in precision machining and high-end equipment manufacturing.
Ranked #7 on
3D Anomaly Detection
on Real 3D-AD
1 code implementation • 14 Jul 2024 • Hao Fang, Jiawei Kong, Bin Chen, Tao Dai, Hao Wu, Shu-Tao Xia
Transferable targeted adversarial attacks aim to mislead models into outputting adversary-specified predictions in black-box scenarios.
no code implementations • 12 Jul 2024 • Yaohua Zha, Yanzi Wang, Tao Dai, Shu-Tao Xia
Firstly, the positional embedding of masked patches in the decoder results in the leakage of their central coordinates, leading to limited 3D representations.
no code implementations • 9 Jun 2024 • Xinhao Zhong, Hao Fang, Bin Chen, Xulin Gu, Tao Dai, Meikang Qiu, Shu-Tao Xia
Dataset distillation is an emerging dataset reduction method, which condenses large-scale datasets while maintaining task accuracy.
1 code implementation • 6 Jun 2024 • Yifan Hu, Peiyuan Liu, Peng Zhu, Dawei Cheng, Tao Dai
Transformer-based and MLP-based methods have emerged as leading approaches in time series forecasting (TSF).
Ranked #14 on
Time Series Forecasting
on ETTh1 (336) Multivariate
1 code implementation • 27 May 2024 • Yaohua Zha, Naiqi Li, Yanzi Wang, Tao Dai, Hang Guo, Bin Chen, Zhi Wang, Zhihao Ouyang, Shu-Tao Xia
Considering the varying information density between masked and unmasked patches in the decoder inputs of MPM, we introduce a locally constrained Mamba-based decoder.
1 code implementation • 24 May 2024 • Shiyu Qin, Jinpeng Wang, Yimin Zhou, Bin Chen, Tianci Luo, Baoyi An, Tao Dai, Shutao Xia, YaoWei Wang
Learned visual compression is an important and active task in multimedia.
1 code implementation • 22 May 2024 • Yuting Wang, Jinpeng Wang, Bin Chen, Tao Dai, Ruisheng Luo, Shu-Tao Xia
Given a text query, partially relevant video retrieval (PRVR) aims to retrieve untrimmed videos containing relevant moments.
no code implementations • 5 May 2024 • Jinmin Li, Tao Dai, Yaohua Zha, Yilu Luo, Longfei Lu, Bin Chen, Zhi Wang, Shu-Tao Xia, Jingyun Zhang
To address this issue, we propose Invertible Residual Rescaling Models (IRRM) for image rescaling by learning a bijection between a high-resolution image and its low-resolution counterpart with a specific distribution.
no code implementations • 5 May 2024 • Jinmin Li, Tao Dai, Jingyun Zhang, Kang Liu, Jun Wang, Shaoming Wang, Shu-Tao Xia, rizen guo
Recently developed generative methods, including invertible rescaling network (IRN) based and generative adversarial network (GAN) based methods, have demonstrated exceptional performance in image rescaling.
1 code implementation • 2 Apr 2024 • Yushen Li, Jinpeng Wang, Tao Dai, Jieming Zhu, Jun Yuan, Rui Zhang, Shu-Tao Xia
Predicting click-through rates (CTR) is a fundamental task for Web applications, where a key issue is to devise effective models for feature interactions.
2 code implementations • 12 Mar 2024 • Peiyuan Liu, Hang Guo, Tao Dai, Naiqi Li, Jigang Bao, Xudong Ren, Yong Jiang, Shu-Tao Xia
Unlike existing methods that focus on training models from a single modal of time series input, large language models (LLMs) based MTSF methods with cross-modal text and time series input have recently shown great superiority, especially with limited temporal data.
Ranked #42 on
Time Series Forecasting
on ETTh1 (336) Multivariate
Knowledge Distillation
Multivariate Time Series Forecasting
+2
2 code implementations • 23 Feb 2024 • Hang Guo, Jinmin Li, Tao Dai, Zhihao Ouyang, Xudong Ren, Shu-Tao Xia
In this way, our MambaIR takes advantage of the local pixel similarity and reduces the channel redundancy.
Ranked #19 on
Image Super-Resolution
on Set14 - 4x upscaling
no code implementations • 8 Feb 2024 • Qianchen Mao, Qiang Li, Bingshu Wang, Yongjun Zhang, Tao Dai, C. L. Philip Chen
To tackle this challenge, we propose SpirDet, a novel approach for efficient detection of infrared small targets.
1 code implementation • 17 Dec 2023 • Yaohua Zha, Huizhen Ji, Jinmin Li, Rongsheng Li, Tao Dai, Bin Chen, Zhi Wang, Shu-Tao Xia
Specifically, to learn more compact features, a share-parameter Transformer encoder is introduced to extract point features from the global and local unmasked patches obtained by global random and local block mask strategies, followed by a specific decoder to reconstruct.
Ranked #5 on
Few-Shot 3D Point Cloud Classification
on ModelNet40 10-way (10-shot)
(using extra training data)
1 code implementation • 12 Dec 2023 • Hang Guo, Tao Dai, Yuanchao Bai, Bin Chen, Xudong Ren, Zexuan Zhu, Shu-Tao Xia
In this work, we introduce an alternative solution to improve the generalization of image restoration models.
no code implementations • 23 Nov 2023 • Shiyu Qin, Yimin Zhou, Jinpeng Wang, Bin Chen, Baoyi An, Tao Dai, Shu-Tao Xia
In this paper, we propose a progressive learning paradigm for transformer-based variable-rate image compression.
no code implementations • 23 Nov 2023 • Shiyu Qin, Bin Chen, Yujun Huang, Baoyi An, Tao Dai, Shu-Tao Xia
The explosion of data has resulted in more and more associated text being transmitted along with images.
1 code implementation • 20 Sep 2023 • Peiyuan Liu, Beiliang Wu, Naiqi Li, Tao Dai, Fengmao Lei, Jigang Bao, Yong Jiang, Shu-Tao Xia
In this paper, we propose a Wavelet-Fourier Transform Network (WFTNet) for long-term time series forecasting.
1 code implementation • 5 Aug 2023 • Hang Guo, Tao Dai, Mingyan Zhu, Guanghao Meng, Bin Chen, Zhi Wang, Shu-Tao Xia
Current solutions for low-resolution text recognition (LTR) typically rely on a two-stage pipeline that involves super-resolution as the first stage followed by the second-stage recognition.
1 code implementation • 19 Jul 2023 • Hang Guo, Tao Dai, Guanghao Meng, Shu-Tao Xia
Scene text image super-resolution (STISR), aiming to improve image quality while boosting downstream scene text recognition accuracy, has recently achieved great success.
3 code implementations • ICCV 2023 • Yaohua Zha, Jinpeng Wang, Tao Dai, Bin Chen, Zhi Wang, Shu-Tao Xia
To conquer this limitation, we propose a novel Instance-aware Dynamic Prompt Tuning (IDPT) strategy for pre-trained point cloud models.
3D Parameter-Efficient Fine-Tuning for Classification
Few-Shot 3D Point Cloud Classification
+1
no code implementations • ICCV 2023 • Xinyi Zhang, Naiqi Li, Jiawei Li, Tao Dai, Yong Jiang, Shu-Tao Xia
Unsupervised surface anomaly detection aims at discovering and localizing anomalous patterns using only anomaly-free training samples.
1 code implementation • 16 Oct 2022 • Yuyuan Zeng, Bowen Zhao, Shanzhao Qiu, Tao Dai, Shu-Tao Xia
Most existing methods mainly focus on extracting global features from tampered images, while neglecting the relationships of local features between tampered and authentic regions within a single tampered image.
no code implementations • 6 Sep 2022 • Yujun Huang, Bin Chen, Shiyu Qin, Jiawei Li, YaoWei Wang, Tao Dai, Shu-Tao Xia
Specifically, MSFDPM consists of a side information feature extractor, a multi-scale feature domain patch matching module, and a multi-scale feature fusion network.
no code implementations • 19 Aug 2022 • Sunan He, Taian Guo, Tao Dai, Ruizhi Qiao, Chen Wu, Xiujun Shu, Bo Ren
Image and language modeling is of crucial importance for vision-language pre-training (VLP), which aims to learn multi-modal representations from large-scale paired image-text data.
1 code implementation • 7 Aug 2022 • Hongwei Li, Tao Dai, Yiming Li, Xueyi Zou, Shu-Tao Xia
Image representation is critical for many visual tasks.
1 code implementation • 5 Jul 2022 • Sunan He, Taian Guo, Tao Dai, Ruizhi Qiao, Bo Ren, Shu-Tao Xia
Specifically, our method exploits multi-modal knowledge of image-text pairs based on a vision and language pre-training (VLP) model.
Ranked #1 on
Multi-label zero-shot learning
on Open Images V4
no code implementations • 19 May 2022 • Qiang Li, Tao Dai, Shu-Tao Xia
Recently, deep learning methods have shown great success in 3D point cloud upsampling.
1 code implementation • 11 Sep 2021 • Jinpeng Wang, Ziyun Zeng, Bin Chen, Tao Dai, Shu-Tao Xia
The high efficiency in computation and storage makes hashing (including binary hashing and quantization) a common strategy in large-scale retrieval systems.
no code implementations • 11 Sep 2021 • Ziyun Zeng, Jinpeng Wang, Bin Chen, Tao Dai, Shu-Tao Xia, Zhi Wang
To improve fine-grained image hashing, we propose Pyramid Hybrid Pooling Quantization (PHPQ).
no code implementations • 18 Oct 2020 • Xingchun Xiang, Qingtao Tang, Huaixuan Zhang, Tao Dai, Jiawei Li, Shu-Tao Xia
To address this issue, we propose a novel regression tree, named James-Stein Regression Tree (JSRT) by considering global information from different nodes.
no code implementations • 16 Oct 2020 • Shudeng Wu, Tao Dai, Shu-Tao Xia
Recently, deep neural networks (DNNs) have been widely and successfully used in Object Detection, e. g.
no code implementations • 26 Feb 2020 • Yan Feng, Bin Chen, Tao Dai, Shu-Tao Xia
Deep product quantization network (DPQN) has recently received much attention in fast image retrieval tasks due to its efficiency of encoding high-dimensional visual features especially when dealing with large-scale datasets.
1 code implementation • CVPR 2019 • Tao Dai, Jianrui Cai, Yongbing Zhang, Shu-Tao Xia, Lei Zhang
Recently, deep convolutional neural networks (CNNs) have been widely explored in single image super-resolution (SISR) and obtained remarkable performance.
Ranked #13 on
Image Super-Resolution
on BSD100 - 4x upscaling
no code implementations • WS 2018 • Jilei Wang, Shiying Luo, Weiyan Shi, Tao Dai, Shu-Tao Xia
Learning vector space representation of words (i. e., word embeddings) has recently attracted wide research interests, and has been extended to cross-lingual scenario.