Search Results for author: Linghe Kong

Found 36 papers, 23 papers with code

Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning

2 code implementations11 Jun 2025 Yuting Li, Lai Wei, Kaipeng Zheng, Jingyuan Huang, Linghe Kong, Lichao Sun, Weiran Huang

Our findings highlight the critical role of visual perturbation in multimodal mathematical reasoning: better reasoning begins with better seeing.

Image Captioning Math +2

Segment Concealed Objects with Incomplete Supervision

no code implementations10 Jun 2025 Chunming He, Kai Li, Yachao Zhang, Ziyun Yang, Youwei Pang, Longxiang Tang, Chengyu Fang, Yulun Zhang, Linghe Kong, Xiu Li, Sina Farsiu

To mitigate the effect of low-quality segmentation masks, we introduce a series of strategies for pseudo-label generation, storage, and supervision.

Pseudo Label Segmentation +1

ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration

no code implementations30 May 2025 Xianglong Yan, Zhiteng Li, Tianao Zhang, Linghe Kong, Yulun Zhang, Xiaokang Yang

Recent methods have explored reducing the hidden dimensions of the KV cache, but many introduce additional computation through projection layers or suffer from significant performance degradation under high compression ratios.

Low-rank compression

Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

2 code implementations28 May 2025 Lai Wei, Yuting Li, Chen Wang, Yue Wang, Linghe Kong, Weiran Huang, Lichao Sun

Overall, MM-UPT offers a new paradigm for continual, autonomous enhancement of MLLMs in the absence of external supervision.

Math Reinforcement Learning (RL)

Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start

2 code implementations28 May 2025 Lai Wei, Yuting Li, Kaipeng Zheng, Chen Wang, Yue Wang, Linghe Kong, Lichao Sun, Weiran Huang

Recent advancements in large language models (LLMs) have demonstrated impressive chain-of-thought reasoning capabilities, with reinforcement learning (RL) playing a crucial role in this progress.

Math Multimodal Reasoning +3

DVD-Quant: Data-free Video Diffusion Transformers Quantization

1 code implementation24 May 2025 Zhiteng Li, Hanxuan Li, Junyi Wu, Kai Liu, Linghe Kong, Guihai Chen, Yulun Zhang, Xiaokang Yang

Diffusion Transformers (DiTs) have emerged as the state-of-the-art architecture for video generation, yet their computational and memory demands hinder practical deployment.

Data Free Quantization Video Generation

Low-bit Model Quantization for Deep Neural Networks: A Survey

no code implementations8 May 2025 Kai Liu, Qian Zheng, Kaiwen Tao, Zhiteng Li, Haotong Qin, Wenbo Li, Yong Guo, Xianglong Liu, Linghe Kong, Guihai Chen, Yulun Zhang, Xiaokang Yang

Therefore, it has become increasingly popular and critical to investigate how to perform the conversion and how to compensate for the information loss.

Quantization

Enhanced Continual Learning of Vision-Language Models with Model Fusion

no code implementations12 Mar 2025 Haoyuan Gao, Zicong Zhang, Yuqi Wei, Linglan Zhao, Guilin Li, Yexin Li, Linghe Kong, Weiran Huang

Vision-Language Models (VLMs) represent a breakthrough in artificial intelligence by integrating visual and textual modalities to achieve impressive zero-shot capabilities.

Continual Learning parameter-efficient fine-tuning

QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation

1 code implementation9 Mar 2025 Junyi Wu, Zhiteng Li, Zheng Hui, Yulun Zhang, Linghe Kong, Xiaokang Yang

Recently, Diffusion Transformers (DiTs) have emerged as a dominant architecture in video generation, surpassing U-Net-based models in terms of performance.

Quantization Video Generation

Improving SAM for Camouflaged Object Detection via Dual Stream Adapters

no code implementations8 Mar 2025 Jiaming Liu, Linghe Kong, Guihai Chen

Segment anything model (SAM) has shown impressive general-purpose segmentation performance on natural images, but its performance on camouflaged object detection (COD) is unsatisfactory.

Knowledge Distillation object-detection +1

CondiQuant: Condition Number Based Low-Bit Quantization for Image Super-Resolution

1 code implementation21 Feb 2025 Kai Liu, Dehui Wang, Zhiteng Li, Zheng Chen, Yong Guo, Wenbo Li, Linghe Kong, Yulun Zhang

Experimentally, we observe that the degradation of quantization is mainly attributed to the quantization of activation instead of model weights.

Image Super-Resolution Quantization

RUN: Reversible Unfolding Network for Concealed Object Segmentation

no code implementations30 Jan 2025 Chunming He, Rihan Zhang, Fengyang Xiao, Chenyu Fang, Longxiang Tang, Yulun Zhang, Linghe Kong, Deng-Ping Fan, Kai Li, Sina Farsiu

To address this, we propose the Reversible Unfolding Network (RUN), which applies reversible strategies across both mask and RGB domains through a theoretically grounded framework, enabling accurate segmentation.

Object Segmentation +1

Pre-trained Molecular Language Models with Random Functional Group Masking

no code implementations3 Nov 2024 Tianhao Peng, Yuchen Li, Xuhong LI, Jiang Bian, Zeke Xie, Ning Sui, Shahid Mumtaz, Yanwu Xu, Linghe Kong, Haoyi Xiong

Recent advancements in computational chemistry have leveraged the power of trans-former-based language models, such as MoLFormer, pre-trained using a vast amount of simplified molecular-input line-entry system (SMILES) sequences, to understand and predict molecular properties and activities, a critical step in fields like drug discovery and materials science.

Computational chemistry Drug Discovery

ARB-LLM: Alternating Refined Binarizations for Large Language Models

1 code implementation4 Oct 2024 Zhiteng Li, Xianglong Yan, Tianao Zhang, Haotong Qin, Dong Xie, Jiang Tian, Zhongchao shi, Linghe Kong, Yulun Zhang, Xiaokang Yang

However, current binarization methods struggle to narrow the distribution gap between binarized and full-precision weights, while also overlooking the column deviation in LLM weight distribution.

Binarization Quantization

Triple Point Masking

1 code implementation26 Sep 2024 Jiaming Liu, Linghe Kong, Yue Wu, Maoguo Gong, Hao Li, Qiguang Miao, Wenping Ma, Can Qin

Existing 3D mask learning methods encounter performance bottlenecks under limited data, and our objective is to overcome this limitation.

Generative Pre-trained Ranking Model with Over-parameterization at Web-Scale (Extended Abstract)

no code implementations25 Sep 2024 Yuchen Li, Haoyi Xiong, Linghe Kong, Jiang Bian, Shuaiqiang Wang, Guihai Chen, Dawei Yin

Learning to rank (LTR) is widely employed in web searches to prioritize pertinent webpages from retrieved content based on input queries.

Learning-To-Rank

2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution

1 code implementation10 Jun 2024 Kai Liu, Haotong Qin, Yong Guo, Xin Yuan, Linghe Kong, Guihai Chen, Yulun Zhang

Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment, which allows advanced SR models to enjoy compact low-bit parameters and efficient integer/bitwise constructions for storage compression and inference acceleration, respectively.

Image Super-Resolution Quantization

Binarized Diffusion Model for Image Super-Resolution

1 code implementation9 Jun 2024 Zheng Chen, Haotong Qin, Yong Guo, Xiongfei Su, Xin Yuan, Linghe Kong, Yulun Zhang

Nonetheless, due to the model structure and the multi-step iterative attribute of DMs, existing binarization methods result in significant performance degradation.

Attribute Binarization +3

CMamba: Channel Correlation Enhanced State Space Models for Multivariate Time Series Forecasting

1 code implementation8 Jun 2024 Chaolv Zeng, Zhanyu Liu, Guanjie Zheng, Linghe Kong

Capturing cross-channel dependencies is critical in enhancing the performance of multivariate time series prediction.

Attribute Mamba +4

CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting

1 code implementation4 Jun 2024 Jianrong Ding, Zhanyu Liu, Guanjie Zheng, Haiming Jin, Linghe Kong

To mitigate this gap, we theoretically analyze the optimization objective of dataset condensation for TS-forecasting and propose a new one-line plugin of dataset condensation designated as Dataset Condensation for Time Series Forecasting (CondTSF) based on our analysis.

Dataset Condensation Time Series +1

LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design

no code implementations28 May 2024 Rui Kong, Qiyang Li, Xinyu Fang, Qingtian Feng, Qingfeng He, Yazhu Dong, Weijun Wang, Yuanchun Li, Linghe Kong, Yunxin Liu

Recent literature has found that an effective method to customize or further improve large language models (LLMs) is to add dynamic adapters, such as low-rank adapters (LoRA) with Mixture-of-Experts (MoE) structures.

Mixture-of-Experts

BinaryHPE: 3D Human Pose and Shape Estimation via Binarization

1 code implementation24 Nov 2023 Zhiteng Li, Yulun Zhang, Jing Lin, Haotong Qin, Jinjin Gu, Xin Yuan, Linghe Kong, Xiaokang Yang

In this work, we propose BinaryHPE, a novel binarization method designed to estimate the 3D human body, face, and hands parameters efficiently.

3D human pose and shape estimation Binarization +2

Image Super-Resolution with Text Prompt Diffusion

1 code implementation24 Nov 2023 Zheng Chen, Yulun Zhang, Jinjin Gu, Xin Yuan, Linghe Kong, Guihai Chen, Xiaokang Yang

Specifically, we first design a text-image generation pipeline to integrate text into the SR dataset through the text degradation representation and degradation model.

Image Generation Image Super-Resolution +2

Natural Language based Context Modeling and Reasoning for Ubiquitous Computing with Large Language Models: A Tutorial

no code implementations24 Sep 2023 Haoyi Xiong, Jiang Bian, Sijia Yang, Xiaofei Zhang, Linghe Kong, Daqing Zhang

Recently, with the rise of LLMs and their improved natural language understanding and reasoning capabilities, it has become feasible to model contexts using natural language and perform context reasoning by interacting with LLMs such as ChatGPT and GPT-4.

Natural Language Understanding Scheduling

SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget

no code implementations29 Aug 2023 Rui Kong, Yuanchun Li, Qingtian Feng, Weijun Wang, Xiaozhou Ye, Ye Ouyang, Linghe Kong, Yunxin Liu

Mixture of experts (MoE) is a popular technique to improve capacity of Large Language Models (LLMs) with conditionally-activated parallel experts.

Mixture-of-Experts object-detection +2

Dual Aggregation Transformer for Image Super-Resolution

1 code implementation ICCV 2023 Zheng Chen, Yulun Zhang, Jinjin Gu, Linghe Kong, Xiaokang Yang, Fisher Yu

Based on the above idea, we propose a novel Transformer model, Dual Aggregation Transformer (DAT), for image SR. Our DAT aggregates features across spatial and channel dimensions, in the inter-block and intra-block dual manner.

Image Super-Resolution

Hierarchical Integration Diffusion Model for Realistic Image Deblurring

1 code implementation NeurIPS 2023 Zheng Chen, Yulun Zhang, Ding Liu, Bin Xia, Jinjin Gu, Linghe Kong, Xin Yuan

Specifically, we perform the DM in a highly compacted latent space to generate the prior feature for the deblurring process.

Deblurring Image Deblurring +2

Recursive Generalization Transformer for Image Super-Resolution

1 code implementation11 Mar 2023 Zheng Chen, Yulun Zhang, Jinjin Gu, Linghe Kong, Xiaokang Yang

In this work, we propose the Recursive Generalization Transformer (RGT) for image SR, which can capture global spatial information and is suitable for high-resolution images.

Image Reconstruction Image Super-Resolution

Xformer: Hybrid X-Shaped Transformer for Image Denoising

1 code implementation11 Mar 2023 Jiale Zhang, Yulun Zhang, Jinjin Gu, Jiahua Dong, Linghe Kong, Xiaokang Yang

The channel-wise Transformer block performs direct global context interactions across tokens defined by channel dimension.

Decoder Image Denoising

Cross Aggregation Transformer for Image Restoration

3 code implementations24 Nov 2022 Zheng Chen, Yulun Zhang, Jinjin Gu, Yongbing Zhang, Linghe Kong, Xin Yuan

The core of our CAT is the Rectangle-Window Self-Attention (Rwin-SA), which utilizes horizontal and vertical rectangle window attention in different heads parallelly to expand the attention area and aggregate the features cross different windows.

Image Restoration Inductive Bias

Accurate Image Restoration with Attention Retractable Transformer

1 code implementation4 Oct 2022 Jiale Zhang, Yulun Zhang, Jinjin Gu, Yongbing Zhang, Linghe Kong, Xin Yuan

This is considered as a dense attention strategy since the interactions of tokens are restrained in dense regions.

Denoising Image Restoration +2

TGAN: Deep Tensor Generative Adversarial Nets for Large Image Generation

1 code implementation28 Jan 2019 Zihan Ding, Xiao-Yang Liu, Miao Yin, Linghe Kong

Secondly, we propose TGAN that integrates deep convolutional generative adversarial networks and tensor super-resolution in a cascading manner, to generate high-quality images from random distributions.

Dictionary Learning Image Generation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.