Search Results for author: Xinghao Chen

Found 53 papers, 30 papers with code

MultiConIR: Towards multi-condition Information Retrieval

1 code implementation11 Mar 2025 Xuan Lu, Sifan Liu, Bochao Yin, Yongqi Li, Xinghao Chen, Hui Su, Yaohui Jin, Wenjun Zeng, Xiaoyu Shen

In this paper, we introduce MultiConIR, the first benchmark designed to evaluate retrieval models in multi-condition scenarios.

Information Retrieval Retrieval

Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning

no code implementations8 Mar 2025 Yanjun Chen, Yirong Sun, Xinghao Chen, Jian Wang, Xiaoyu Shen, Wenjie Li, Wei zhang

Chain-of-Thought (CoT) reasoning has proven effective in natural language tasks but remains underexplored in multimodal alignment.

Multimodal Reasoning

Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model

1 code implementation2 Dec 2024 Qianhan Feng, Wenshuo Li, Tong Lin, Xinghao Chen

The teacher also helps student learn the projection of vision token into text embedding space based on the focus of text.

cross-modal alignment Knowledge Distillation +2

TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba

1 code implementation26 Nov 2024 Xiaowen Ma, ZhenLiang Ni, Xinghao Chen

Based on the analyses, we introduce a novel Laplace mixer to decouple the features in terms of frequency and input only the low-frequency components into the Mamba block.

Image Classification Instance Segmentation +4

The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models

1 code implementation9 Oct 2024 Yanjun Chen, Dawei Zhu, Yirong Sun, Xinghao Chen, Wei zhang, Xiaoyu Shen

Reinforcement Learning from Human Feedback significantly enhances Natural Language Processing by aligning language models with human expectations.

Full-Stage Pseudo Label Quality Enhancement for Weakly-supervised Temporal Action Localization

1 code implementation12 Jul 2024 Qianhan Feng, Wenshuo Li, Tong Lin, Xinghao Chen

However, the quality of pseudo labels in the framework, which is a key factor to the final result, is not carefully studied.

Contrastive Learning Pseudo Label +2

ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking

1 code implementation17 Jun 2024 Wenshuo Li, Xinghao Chen, Han Shu, Yehui Tang, Yunhe Wang

For instance, we achieve approximately $70\times$ compression for the Pythia-410M model, with the final performance being as accurate as the original model on various downstream tasks.

Model Optimization Quantization

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

3 code implementations19 May 2024 Jialong Guo, Xinghao Chen, Yehui Tang, Yunhe Wang

However, replacing LayerNorm with more efficient BatchNorm in transformer often leads to inferior performance and collapse in training.

Image Classification Language Modeling +3

No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding

3 code implementations14 May 2024 Yingjie Zhai, Wenshuo Li, Yehui Tang, Xinghao Chen, Yunhe Wang

In this paper, we propose to squeeze the time axis of a video sequence into the channel dimension and present a lightweight video recognition network, term as \textit{SqueezeTime}, for mobile video understanding.

Action Detection Video Recognition +1

SSA-Seg: Semantic and Spatial Adaptive Pixel-level Classifier for Semantic Segmentation

3 code implementations10 May 2024 Xiaowen Ma, ZhenLiang Ni, Xinghao Chen

Specifically, we employ the coarse masks obtained from the fixed prototypes as a guide to adjust the fixed prototype towards the center of the semantic and spatial domains in the test image.

Semantic Segmentation

DECO: Query-Based End-to-End Object Detection with ConvNets

3 code implementations21 Dec 2023 Xinghao Chen, Siwei Li, Yijing Yang, Yunhe Wang

The proposed framework, \ie, Detection ConvNet (DECO), is composed of a backbone and convolutional encoder-decoder architecture.

Decoder Object +2

TinySAM: Pushing the Envelope for Efficient Segment Anything Model

2 code implementations21 Dec 2023 Han Shu, Wenshuo Li, Yehui Tang, Yiman Zhang, Yihao Chen, Houqiang Li, Yunhe Wang, Xinghao Chen

Massive following works have developed various applications based on the pre-trained SAM and achieved impressive performance on downstream vision tasks.

Knowledge Distillation Quantization

PPT: Token Pruning and Pooling for Efficient Vision Transformers

2 code implementations3 Oct 2023 Xinjian Wu, Fanhu Zeng, Xiudong Wang, Xinghao Chen

Vision Transformers (ViTs) have emerged as powerful models in the field of computer vision, delivering superior performance across various vision tasks.

Efficient ViTs

Less is More: Focus Attention for Efficient DETR

4 code implementations ICCV 2023 Dehua Zheng, Wenhui Dong, Hailin Hu, Xinghao Chen, Yunhe Wang

DETR-like models have significantly boosted the performance of detectors and even outperformed classical convolutional models.

BiMLP: Compact Binary Architectures for Vision Multi-Layer Perceptrons

5 code implementations29 Dec 2022 Yixing Xu, Xinghao Chen, Yunhe Wang

This paper studies the problem of designing compact binary architectures for vision multi-layer perceptrons (MLPs).

Binarization

AdaBin: Improving Binary Neural Networks with Adaptive Binary Sets

3 code implementations17 Aug 2022 Zhijun Tu, Xinghao Chen, Pengju Ren, Yunhe Wang

Since the modern deep neural networks are of sophisticated design with complex architecture for the accuracy reason, the diversity on distributions of weights and activations is very high.

Classification with Binary Neural Network Quantization

Multimodal Token Fusion for Vision Transformers

11 code implementations journal 2022 Yikai Wang, Xinghao Chen, Lele Cao, Wenbing Huang, Fuchun Sun, Yunhe Wang

Many adaptations of transformers have emerged to address the single-modal vision tasks, where self-attention modules are stacked to handle input sources like images.

3D Object Detection Image-to-Image Translation +2

MaskGroup: Hierarchical Point Grouping and Masking for 3D Instance Segmentation

no code implementations28 Mar 2022 Min Zhong, Xinghao Chen, Xiaokang Chen, Gang Zeng, Yunhe Wang

For instance, our approach achieves a 66. 4\% mAP with the 0. 5 IoU threshold on the ScanNetV2 test set, which is 1. 9\% higher than the state-of-the-art method.

3D Instance Segmentation Semantic Segmentation

AutoLoss-GMS: Searching Generalized Margin-Based Softmax Loss Function for Person Re-Identification

no code implementations CVPR 2022 Hongyang Gu, Jianmin Li, Guangyuan Fu, Chifong Wong, Xinghao Chen, Jun Zhu

In this paper, we propose a novel method, AutoLoss-GMS, to search the better loss function in the space of generalized margin-based softmax loss function for person re-identification automatically.

Person Re-Identification

An Empirical Study of Adder Neural Networks for Object Detection

no code implementations NeurIPS 2021 Xinghao Chen, Chang Xu, Minjing Dong, Chunjing Xu, Yunhe Wang

Adder neural networks (AdderNets) have shown impressive performance on image classification with only addition operations, which are more energy efficient than traditional convolutional neural networks built with multiplications.

Autonomous Driving Face Detection +3

Towards Stable and Robust AdderNets

no code implementations NeurIPS 2021 Minjing Dong, Yunhe Wang, Xinghao Chen, Chang Xu

Adder neural network (AdderNet) replaces the original convolutions with massive multiplications by cheap additions while achieving comparable performance thus yields a series of energy-efficient neural networks.

Adversarial Robustness

Handling Long-tailed Feature Distribution in AdderNets

no code implementations NeurIPS 2021 Minjing Dong, Yunhe Wang, Xinghao Chen, Chang Xu

Adder neural networks (ANNs) are designed for low energy cost which replace expensive multiplications in convolutional neural networks (CNNs) with cheaper additions to yield energy-efficient neural networks and hardware accelerations.

Knowledge Distillation

Hire-MLP: Vision MLP via Hierarchical Rearrangement

10 code implementations CVPR 2022 Jianyuan Guo, Yehui Tang, Kai Han, Xinghao Chen, Han Wu, Chao Xu, Chang Xu, Yunhe Wang

Previous vision MLPs such as MLP-Mixer and ResMLP accept linearly flattened image patches as input, making them inflexible for different input sizes and hard to capture spatial information.

Image Classification object-detection +2

CMT: Convolutional Neural Networks Meet Vision Transformers

14 code implementations CVPR 2022 Jianyuan Guo, Kai Han, Han Wu, Yehui Tang, Xinghao Chen, Yunhe Wang, Chang Xu

Vision transformers have been successfully applied to image recognition tasks due to their ability to capture long-range dependencies within an image.

Positive-Unlabeled Data Purification in the Wild for Object Detection

no code implementations CVPR 2021 Jianyuan Guo, Kai Han, Han Wu, Chao Zhang, Xinghao Chen, Chunjing Xu, Chang Xu, Yunhe Wang

In this paper, we present a positive-unlabeled learning based scheme to expand training data by purifying valuable images from massive unlabeled ones, where the original training data are viewed as positive data and the unlabeled images in the wild are unlabeled data.

Knowledge Distillation object-detection +1

Data-Free Knowledge Distillation for Image Super-Resolution

no code implementations CVPR 2021 Yiman Zhang, Hanting Chen, Xinghao Chen, Yiping Deng, Chunjing Xu, Yunhe Wang

Experiments on various datasets and architectures demonstrate that the proposed method is able to be utilized for effectively learning portable student networks without the original data, e. g., with 0. 16dB PSNR drop on Set5 for x2 super resolution.

Data-free Knowledge Distillation Image Super-Resolution +1

Winograd Algorithm for AdderNet

no code implementations12 May 2021 Wenshuo Li, Hanting Chen, Mingqiang Huang, Xinghao Chen, Chunjing Xu, Yunhe Wang

Adder neural network (AdderNet) is a new kind of deep model that replaces the original massive multiplications in convolutions by additions while preserving the high performance.

valid

Distilling Object Detectors via Decoupled Features

1 code implementation CVPR 2021 Jianyuan Guo, Kai Han, Yunhe Wang, Han Wu, Xinghao Chen, Chunjing Xu, Chang Xu

To this end, we present a novel distillation algorithm via decoupled features (DeFeat) for learning a better student detector.

Image Classification Knowledge Distillation +3

A Survey on Visual Transformer

no code implementations23 Dec 2020 Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang, DaCheng Tao

Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly based on the self-attention mechanism.

Image Classification Inductive Bias +1

Kernel Based Progressive Distillation for Adder Neural Networks

no code implementations NeurIPS 2020 Yixing Xu, Chang Xu, Xinghao Chen, Wei zhang, Chunjing Xu, Yunhe Wang

A convolutional neural network (CNN) with the same architecture is simultaneously initialized and trained as a teacher network, features and weights of ANN and CNN will be transformed to a new space to eliminate the accuracy drop.

Knowledge Distillation

MTP: Multi-Task Pruning for Efficient Semantic Segmentation Networks

no code implementations16 Jul 2020 Xinghao Chen, Yiman Zhang, Yunhe Wang

To identify the redundancy in segmentation networks, we present a multi-task channel pruning approach.

Classification General Classification +3

HourNAS: Extremely Fast Neural Architecture Search Through an Hourglass Lens

6 code implementations CVPR 2021 Zhaohui Yang, Yunhe Wang, Xinghao Chen, Jianyuan Guo, Wei zhang, Chao Xu, Chunjing Xu, DaCheng Tao, Chang Xu

To achieve an extremely fast NAS while preserving the high accuracy, we propose to identify the vital blocks and make them the priority in the architecture search.

Neural Architecture Search

Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection

1 code implementation CVPR 2020 Jianyuan Guo, Kai Han, Yunhe Wang, Chao Zhang, Zhaohui Yang, Han Wu, Xinghao Chen, Chang Xu

To this end, we propose a hierarchical trinity search framework to simultaneously discover efficient architectures for all components (i. e. backbone, neck, and head) of object detector in an end-to-end manner.

Image Classification Neural Architecture Search +3

CARS: Continuous Evolution for Efficient Neural Architecture Search

1 code implementation CVPR 2020 Zhaohui Yang, Yunhe Wang, Xinghao Chen, Boxin Shi, Chao Xu, Chunjing Xu, Qi Tian, Chang Xu

Architectures in the population that share parameters within one SuperNet in the latest generation will be tuned over the training dataset with a few epochs.

Neural Architecture Search

Bi-stream Pose Guided Region Ensemble Network for Fingertip Localization from Stereo Images

no code implementations26 Feb 2019 Guijin Wang, Cairong Zhang, Xinghao Chen, Xiangyang Ji, Jing-Hao Xue, Hang Wang

To mitigate these limitations and promote further research on hand pose estimation from stereo images, we propose a new large-scale binocular hand pose dataset called THU-Bi-Hand, offering a new perspective for fingertip localization.

3D Hand Pose Estimation Missing Values

Two-Stream Binocular Network: Accurate Near Field Finger Detection Based On Binocular Images

no code implementations26 Apr 2018 Yi Wei, Guijin Wang, Cairong Zhang, Hengkai Guo, Xinghao Chen, Huazhong Yang

Different from previous works, we propose a new framework, named Two-Stream Binocular Network (TSBnet) to detect fingertips from binocular images directly.

Fingertip Detection Hand Pose Estimation

Interactive Hand Pose Estimation: Boosting accuracy in localizing extended finger joints

no code implementations2 Apr 2018 Cairong Zhang, Guijin Wang, Hengkai Guo, Xinghao Chen, Fei Qiao, Huazhong Yang

In the reality of HMI, joints in fingers stretching out, especially corresponding fingertips, are much more important than other joints.

3D Hand Pose Estimation

Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation

1 code implementation11 Aug 2017 Xinghao Chen, Guijin Wang, Hengkai Guo, Cairong Zhang

The proposed method extracts regions from the feature maps of convolutional neural network under the guide of an initially estimated pose, generating more optimal and representative features for hand pose estimation.

Hand Pose Estimation

Towards Good Practices for Deep 3D Hand Pose Estimation

no code implementations23 Jul 2017 Hengkai Guo, Guijin Wang, Xinghao Chen, Cairong Zhang

3D hand pose estimation from single depth image is an important and challenging problem for human-computer interaction.

3D Hand Pose Estimation Data Augmentation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.