2 code implementations • 11 Jun 2025 • Yuting Li, Lai Wei, Kaipeng Zheng, Jingyuan Huang, Linghe Kong, Lichao Sun, Weiran Huang
Our findings highlight the critical role of visual perturbation in multimodal mathematical reasoning: better reasoning begins with better seeing.
no code implementations • 10 Jun 2025 • Chunming He, Kai Li, Yachao Zhang, Ziyun Yang, Youwei Pang, Longxiang Tang, Chengyu Fang, Yulun Zhang, Linghe Kong, Xiu Li, Sina Farsiu
To mitigate the effect of low-quality segmentation masks, we introduce a series of strategies for pseudo-label generation, storage, and supervision.
no code implementations • 30 May 2025 • Xianglong Yan, Zhiteng Li, Tianao Zhang, Linghe Kong, Yulun Zhang, Xiaokang Yang
Recent methods have explored reducing the hidden dimensions of the KV cache, but many introduce additional computation through projection layers or suffer from significant performance degradation under high compression ratios.
2 code implementations • 28 May 2025 • Lai Wei, Yuting Li, Chen Wang, Yue Wang, Linghe Kong, Weiran Huang, Lichao Sun
Overall, MM-UPT offers a new paradigm for continual, autonomous enhancement of MLLMs in the absence of external supervision.
2 code implementations • 28 May 2025 • Lai Wei, Yuting Li, Kaipeng Zheng, Chen Wang, Yue Wang, Linghe Kong, Lichao Sun, Weiran Huang
Recent advancements in large language models (LLMs) have demonstrated impressive chain-of-thought reasoning capabilities, with reinforcement learning (RL) playing a crucial role in this progress.
1 code implementation • 24 May 2025 • Zhiteng Li, Hanxuan Li, Junyi Wu, Kai Liu, Linghe Kong, Guihai Chen, Yulun Zhang, Xiaokang Yang
Diffusion Transformers (DiTs) have emerged as the state-of-the-art architecture for video generation, yet their computational and memory demands hinder practical deployment.
no code implementations • 8 May 2025 • Kai Liu, Qian Zheng, Kaiwen Tao, Zhiteng Li, Haotong Qin, Wenbo Li, Yong Guo, Xianglong Liu, Linghe Kong, Guihai Chen, Yulun Zhang, Xiaokang Yang
Therefore, it has become increasingly popular and critical to investigate how to perform the conversion and how to compensate for the information loss.
no code implementations • 12 Mar 2025 • Haoyuan Gao, Zicong Zhang, Yuqi Wei, Linglan Zhao, Guilin Li, Yexin Li, Linghe Kong, Weiran Huang
Vision-Language Models (VLMs) represent a breakthrough in artificial intelligence by integrating visual and textual modalities to achieve impressive zero-shot capabilities.
1 code implementation • 9 Mar 2025 • Junyi Wu, Zhiteng Li, Zheng Hui, Yulun Zhang, Linghe Kong, Xiaokang Yang
Recently, Diffusion Transformers (DiTs) have emerged as a dominant architecture in video generation, surpassing U-Net-based models in terms of performance.
no code implementations • 8 Mar 2025 • Jiaming Liu, Linghe Kong, Guihai Chen
Segment anything model (SAM) has shown impressive general-purpose segmentation performance on natural images, but its performance on camouflaged object detection (COD) is unsatisfactory.
1 code implementation • 21 Feb 2025 • Kai Liu, Dehui Wang, Zhiteng Li, Zheng Chen, Yong Guo, Wenbo Li, Linghe Kong, Yulun Zhang
Experimentally, we observe that the degradation of quantization is mainly attributed to the quantization of activation instead of model weights.
1 code implementation • 1 Feb 2025 • Kai Liu, Kaicheng Yang, Zheng Chen, Zhiteng Li, Yong Guo, Wenbo Li, Linghe Kong, Yulun Zhang
One is to compress the DM into 1-bit, aka binarization, alleviating the storage and computation pressure.
no code implementations • 30 Jan 2025 • Chunming He, Rihan Zhang, Fengyang Xiao, Chenyu Fang, Longxiang Tang, Yulun Zhang, Linghe Kong, Deng-Ping Fan, Kai Li, Sina Farsiu
To address this, we propose the Reversible Unfolding Network (RUN), which applies reversible strategies across both mask and RGB domains through a theoretically grounded framework, enabling accurate segmentation.
no code implementations • 3 Nov 2024 • Tianhao Peng, Yuchen Li, Xuhong LI, Jiang Bian, Zeke Xie, Ning Sui, Shahid Mumtaz, Yanwu Xu, Linghe Kong, Haoyi Xiong
Recent advancements in computational chemistry have leveraged the power of trans-former-based language models, such as MoLFormer, pre-trained using a vast amount of simplified molecular-input line-entry system (SMILES) sequences, to understand and predict molecular properties and activities, a critical step in fields like drug discovery and materials science.
1 code implementation • 4 Oct 2024 • Zhiteng Li, Xianglong Yan, Tianao Zhang, Haotong Qin, Dong Xie, Jiang Tian, Zhongchao shi, Linghe Kong, Yulun Zhang, Xiaokang Yang
However, current binarization methods struggle to narrow the distribution gap between binarized and full-precision weights, while also overlooking the column deviation in LLM weight distribution.
1 code implementation • 3 Oct 2024 • Kai Liu, Ziqing Zhang, Wenbo Li, Renjing Pei, Fenglong Song, Xiaohong Liu, Linghe Kong, Yulun Zhang
Image quality assessment (IQA) serves as the golden standard for all models' performance in nearly all computer vision fields.
1 code implementation • 26 Sep 2024 • Jiaming Liu, Linghe Kong, Yue Wu, Maoguo Gong, Hao Li, Qiguang Miao, Wenping Ma, Can Qin
Existing 3D mask learning methods encounter performance bottlenecks under limited data, and our objective is to overcome this limitation.
no code implementations • 25 Sep 2024 • Yuchen Li, Haoyi Xiong, Linghe Kong, Zeyi Sun, Hongyang Chen, Shuaiqiang Wang, Dawei Yin
Both Transformer and Graph Neural Networks (GNNs) have been employed in the domain of learning to rank (LTR).
no code implementations • 25 Sep 2024 • Yuchen Li, Haoyi Xiong, Linghe Kong, Jiang Bian, Shuaiqiang Wang, Guihai Chen, Dawei Yin
Learning to rank (LTR) is widely employed in web searches to prioritize pertinent webpages from retrieved content based on input queries.
1 code implementation • 10 Jun 2024 • Kai Liu, Haotong Qin, Yong Guo, Xin Yuan, Linghe Kong, Guihai Chen, Yulun Zhang
Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment, which allows advanced SR models to enjoy compact low-bit parameters and efficient integer/bitwise constructions for storage compression and inference acceleration, respectively.
1 code implementation • 9 Jun 2024 • Zheng Chen, Haotong Qin, Yong Guo, Xiongfei Su, Xin Yuan, Linghe Kong, Yulun Zhang
Nonetheless, due to the model structure and the multi-step iterative attribute of DMs, existing binarization methods result in significant performance degradation.
1 code implementation • 8 Jun 2024 • Chaolv Zeng, Zhanyu Liu, Guanjie Zheng, Linghe Kong
Capturing cross-channel dependencies is critical in enhancing the performance of multivariate time series prediction.
1 code implementation • 4 Jun 2024 • Jianrong Ding, Zhanyu Liu, Guanjie Zheng, Haiming Jin, Linghe Kong
To mitigate this gap, we theoretically analyze the optimization objective of dataset condensation for TS-forecasting and propose a new one-line plugin of dataset condensation designated as Dataset Condensation for Time Series Forecasting (CondTSF) based on our analysis.
no code implementations • 28 May 2024 • Rui Kong, Qiyang Li, Xinyu Fang, Qingtian Feng, Qingfeng He, Yazhu Dong, Weijun Wang, Yuanchun Li, Linghe Kong, Yunxin Liu
Recent literature has found that an effective method to customize or further improve large language models (LLMs) is to add dynamic adapters, such as low-rank adapters (LoRA) with Mixture-of-Experts (MoE) structures.
1 code implementation • 24 Nov 2023 • Zhiteng Li, Yulun Zhang, Jing Lin, Haotong Qin, Jinjin Gu, Xin Yuan, Linghe Kong, Xiaokang Yang
In this work, we propose BinaryHPE, a novel binarization method designed to estimate the 3D human body, face, and hands parameters efficiently.
1 code implementation • 24 Nov 2023 • Zheng Chen, Yulun Zhang, Jinjin Gu, Xin Yuan, Linghe Kong, Guihai Chen, Xiaokang Yang
Specifically, we first design a text-image generation pipeline to integrate text into the SR dataset through the text degradation representation and degradation model.
no code implementations • 24 Sep 2023 • Haoyi Xiong, Jiang Bian, Sijia Yang, Xiaofei Zhang, Linghe Kong, Daqing Zhang
Recently, with the rise of LLMs and their improved natural language understanding and reasoning capabilities, it has become feasible to model contexts using natural language and perform context reasoning by interacting with LLMs such as ChatGPT and GPT-4.
no code implementations • 9 Sep 2023 • Qiao Xiang, Yuling Lin, Mingjun Fang, Bang Huang, Siyong Huang, Ridi Wen, Franck Le, Linghe Kong, Jiwu Shu
Reproducing research results in the networking community is important for both academia and industry.
no code implementations • 29 Aug 2023 • Rui Kong, Yuanchun Li, Qingtian Feng, Weijun Wang, Xiaozhou Ye, Ye Ouyang, Linghe Kong, Yunxin Liu
Mixture of experts (MoE) is a popular technique to improve capacity of Large Language Models (LLMs) with conditionally-activated parallel experts.
1 code implementation • ICCV 2023 • Zheng Chen, Yulun Zhang, Jinjin Gu, Linghe Kong, Xiaokang Yang, Fisher Yu
Based on the above idea, we propose a novel Transformer model, Dual Aggregation Transformer (DAT), for image SR. Our DAT aggregates features across spatial and channel dimensions, in the inter-block and intra-block dual manner.
Ranked #11 on
Image Super-Resolution
on Manga109 - 4x upscaling
1 code implementation • NeurIPS 2023 • Zheng Chen, Yulun Zhang, Ding Liu, Bin Xia, Jinjin Gu, Linghe Kong, Xin Yuan
Specifically, we perform the DM in a highly compacted latent space to generate the prior feature for the deblurring process.
1 code implementation • 11 Mar 2023 • Zheng Chen, Yulun Zhang, Jinjin Gu, Linghe Kong, Xiaokang Yang
In this work, we propose the Recursive Generalization Transformer (RGT) for image SR, which can capture global spatial information and is suitable for high-resolution images.
Ranked #10 on
Image Super-Resolution
on Manga109 - 4x upscaling
1 code implementation • 11 Mar 2023 • Jiale Zhang, Yulun Zhang, Jinjin Gu, Jiahua Dong, Linghe Kong, Xiaokang Yang
The channel-wise Transformer block performs direct global context interactions across tokens defined by channel dimension.
3 code implementations • 24 Nov 2022 • Zheng Chen, Yulun Zhang, Jinjin Gu, Yongbing Zhang, Linghe Kong, Xin Yuan
The core of our CAT is the Rectangle-Window Self-Attention (Rwin-SA), which utilizes horizontal and vertical rectangle window attention in different heads parallelly to expand the attention area and aggregate the features cross different windows.
1 code implementation • 4 Oct 2022 • Jiale Zhang, Yulun Zhang, Jinjin Gu, Yongbing Zhang, Linghe Kong, Xin Yuan
This is considered as a dense attention strategy since the interactions of tokens are restrained in dense regions.
1 code implementation • 28 Jan 2019 • Zihan Ding, Xiao-Yang Liu, Miao Yin, Linghe Kong
Secondly, we propose TGAN that integrates deep convolutional generative adversarial networks and tensor super-resolution in a cascading manner, to generate high-quality images from random distributions.