1 code implementation • 11 Mar 2025 • Xuan Lu, Sifan Liu, Bochao Yin, Yongqi Li, Xinghao Chen, Hui Su, Yaohui Jin, Wenjun Zeng, Xiaoyu Shen
In this paper, we introduce MultiConIR, the first benchmark designed to evaluate retrieval models in multi-condition scenarios.
no code implementations • 8 Mar 2025 • Yanjun Chen, Yirong Sun, Xinghao Chen, Jian Wang, Xiaoyu Shen, Wenjie Li, Wei zhang
Chain-of-Thought (CoT) reasoning has proven effective in natural language tasks but remains underexplored in multimodal alignment.
1 code implementation • 25 Feb 2025 • Xinghao Chen, Zhijing Sun, Wenjin Guo, Miaoran Zhang, Yanjun Chen, Yirong Sun, Hui Su, Yijie Pan, Dietrich Klakow, Wenjie Li, Xiaoyu Shen
Large Language Models (LLMs) excel in reasoning tasks through Chain-of-Thought (CoT) prompting.
no code implementations • 20 Jan 2025 • ZhenLiang Ni, Qiangyu Yan, Mouxiao Huang, Tianning Yuan, Yehui Tang, Hailin Hu, Xinghao Chen, Yunhe Wang
We conduct a comprehensive evaluation of different advanced video generators and present a challenging setting.
1 code implementation • 2 Dec 2024 • Qianhan Feng, Wenshuo Li, Tong Lin, Xinghao Chen
The teacher also helps student learn the projection of vision token into text embedding space based on the focus of text.
1 code implementation • 26 Nov 2024 • Xiaowen Ma, ZhenLiang Ni, Xinghao Chen
Based on the analyses, we introduce a novel Laplace mixer to decouple the features in terms of frequency and input only the low-frequency components into the Mamba block.
1 code implementation • 28 Oct 2024 • Yirong Sun, Dawei Zhu, Yanjun Chen, Erjia Xiao, Xinghao Chen, Xiaoyu Shen
This work investigates the inherent capability of instruction-tuned LLMs for document-level translation (docMT).
1 code implementation • 9 Oct 2024 • Yanjun Chen, Dawei Zhu, Yirong Sun, Xinghao Chen, Wei zhang, Xiaoyu Shen
Reinforcement Learning from Human Feedback significantly enhances Natural Language Processing by aligning language models with human expectations.
1 code implementation • 12 Jul 2024 • Qianhan Feng, Wenshuo Li, Tong Lin, Xinghao Chen
However, the quality of pseudo labels in the framework, which is a key factor to the final result, is not carefully studied.
1 code implementation • 17 Jun 2024 • Wenshuo Li, Xinghao Chen, Han Shu, Yehui Tang, Yunhe Wang
For instance, we achieve approximately $70\times$ compression for the Pythia-410M model, with the final performance being as accurate as the original model on various downstream tasks.
1 code implementation • 3 Jun 2024 • Ding Jia, Jianyuan Guo, Kai Han, Han Wu, Chao Zhang, Chang Xu, Xinghao Chen
Cross-modal transformers have demonstrated superiority in various vision tasks by effectively integrating different modalities.
Ranked #1 on
Semantic Segmentation
on SUN-RGBD
3 code implementations • 19 May 2024 • Jialong Guo, Xinghao Chen, Yehui Tang, Yunhe Wang
However, replacing LayerNorm with more efficient BatchNorm in transformer often leads to inferior performance and collapse in training.
3 code implementations • 14 May 2024 • Yingjie Zhai, Wenshuo Li, Yehui Tang, Xinghao Chen, Yunhe Wang
In this paper, we propose to squeeze the time axis of a video sequence into the channel dimension and present a lightweight video recognition network, term as \textit{SqueezeTime}, for mobile video understanding.
3 code implementations • 10 May 2024 • Xiaowen Ma, ZhenLiang Ni, Xinghao Chen
Specifically, we employ the coarse masks obtained from the fixed prototypes as a guide to adjust the fixed prototype towards the center of the semantic and spatial domains in the test image.
1 code implementation • 10 May 2024 • ZhenLiang Ni, Xinghao Chen, Yingjie Zhai, Yehui Tang, Yunhe Wang
A shape self-calibration function is designed to make the key areas closer to foreground objects.
3 code implementations • 21 Dec 2023 • Xinghao Chen, Siwei Li, Yijing Yang, Yunhe Wang
The proposed framework, \ie, Detection ConvNet (DECO), is composed of a backbone and convolutional encoder-decoder architecture.
2 code implementations • 21 Dec 2023 • Han Shu, Wenshuo Li, Yehui Tang, Yiman Zhang, Yihao Chen, Houqiang Li, Yunhe Wang, Xinghao Chen
Massive following works have developed various applications based on the pre-trained SAM and achieved impressive performance on downstream vision tasks.
2 code implementations • 3 Oct 2023 • Xinjian Wu, Fanhu Zeng, Xiudong Wang, Xinghao Chen
Vision Transformers (ViTs) have emerged as powerful models in the field of computer vision, delivering superior performance across various vision tasks.
Ranked #4 on
Efficient ViTs
on ImageNet-1K (with DeiT-S)
4 code implementations • ICCV 2023 • Dehua Zheng, Wenhui Dong, Hailin Hu, Xinghao Chen, Yunhe Wang
DETR-like models have significantly boosted the performance of detectors and even outperformed classical convolutional models.
5 code implementations • 29 Dec 2022 • Yixing Xu, Xinghao Chen, Yunhe Wang
This paper studies the problem of designing compact binary architectures for vision multi-layer perceptrons (MLPs).
3 code implementations • 17 Aug 2022 • Zhijun Tu, Xinghao Chen, Pengju Ren, Yunhe Wang
Since the modern deep neural networks are of sophisticated design with complex architecture for the accuracy reason, the diversity on distributions of weights and activations is very high.
1 code implementation • International Conference on Machine Learning 2022 • Yanxi Li, Xinghao Chen, Minjing Dong, Yehui Tang, Yunhe Wang, Chang Xu
Recently, neural architectures with all Multi-layer Perceptrons (MLPs) have attracted great research interest from the computer vision community.
Ranked #538 on
Image Classification
on ImageNet
11 code implementations • journal 2022 • Yikai Wang, Xinghao Chen, Lele Cao, Wenbing Huang, Fuchun Sun, Yunhe Wang
Many adaptations of transformers have emerged to address the single-modal vision tasks, where self-attention modules are stacked to handle input sources like images.
Ranked #4 on
Semantic Segmentation
on LLRGBD-synthetic
no code implementations • 28 Mar 2022 • Min Zhong, Xinghao Chen, Xiaokang Chen, Gang Zeng, Yunhe Wang
For instance, our approach achieves a 66. 4\% mAP with the 0. 5 IoU threshold on the ScanNetV2 test set, which is 1. 9\% higher than the state-of-the-art method.
Ranked #7 on
3D Instance Segmentation
on S3DIS
no code implementations • CVPR 2022 • Hongyang Gu, Jianmin Li, Guangyuan Fu, Chifong Wong, Xinghao Chen, Jun Zhu
In this paper, we propose a novel method, AutoLoss-GMS, to search the better loss function in the space of generalized margin-based softmax loss function for person re-identification automatically.
no code implementations • NeurIPS 2021 • Xinghao Chen, Chang Xu, Minjing Dong, Chunjing Xu, Yunhe Wang
Adder neural networks (AdderNets) have shown impressive performance on image classification with only addition operations, which are more energy efficient than traditional convolutional neural networks built with multiplications.
no code implementations • NeurIPS 2021 • Minjing Dong, Yunhe Wang, Xinghao Chen, Chang Xu
Adder neural network (AdderNet) replaces the original convolutions with massive multiplications by cheap additions while achieving comparable performance thus yields a series of energy-efficient neural networks.
no code implementations • NeurIPS 2021 • Minjing Dong, Yunhe Wang, Xinghao Chen, Chang Xu
Adder neural networks (ANNs) are designed for low energy cost which replace expensive multiplications in convolutional neural networks (CNNs) with cheaper additions to yield energy-efficient neural networks and hardware accelerations.
10 code implementations • CVPR 2022 • Jianyuan Guo, Yehui Tang, Kai Han, Xinghao Chen, Han Wu, Chao Xu, Chang Xu, Yunhe Wang
Previous vision MLPs such as MLP-Mixer and ResMLP accept linearly flattened image patches as input, making them inflexible for different input sizes and hard to capture spatial information.
14 code implementations • CVPR 2022 • Jianyuan Guo, Kai Han, Han Wu, Yehui Tang, Xinghao Chen, Yunhe Wang, Chang Xu
Vision transformers have been successfully applied to image recognition tasks due to their ability to capture long-range dependencies within an image.
no code implementations • CVPR 2021 • Jianyuan Guo, Kai Han, Han Wu, Chao Zhang, Xinghao Chen, Chunjing Xu, Chang Xu, Yunhe Wang
In this paper, we present a positive-unlabeled learning based scheme to expand training data by purifying valuable images from massive unlabeled ones, where the original training data are viewed as positive data and the unlabeled images in the wild are unlabeled data.
no code implementations • CVPR 2021 • Yiman Zhang, Hanting Chen, Xinghao Chen, Yiping Deng, Chunjing Xu, Yunhe Wang
Experiments on various datasets and architectures demonstrate that the proposed method is able to be utilized for effectively learning portable student networks without the original data, e. g., with 0. 16dB PSNR drop on Set5 for x2 super resolution.
no code implementations • 12 May 2021 • Wenshuo Li, Hanting Chen, Mingqiang Huang, Xinghao Chen, Chunjing Xu, Yunhe Wang
Adder neural network (AdderNet) is a new kind of deep model that replaces the original massive multiplications in convolutions by additions while preserving the high performance.
1 code implementation • CVPR 2021 • Jianyuan Guo, Kai Han, Yunhe Wang, Han Wu, Xinghao Chen, Chunjing Xu, Chang Xu
To this end, we present a novel distillation algorithm via decoupled features (DeFeat) for learning a better student detector.
no code implementations • 23 Dec 2020 • Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang, DaCheng Tao
Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly based on the self-attention mechanism.
1 code implementation • 3 Nov 2020 • Bochao Wang, Hang Xu, Jiajin Zhang, Chen Chen, Xiaozhi Fang, Yixing Xu, Ning Kang, Lanqing Hong, Chenhan Jiang, Xinyue Cai, Jiawei Li, Fengwei Zhou, Yong Li, Zhicheng Liu, Xinghao Chen, Kai Han, Han Shu, Dehua Song, Yunhe Wang, Wei zhang, Chunjing Xu, Zhenguo Li, Wenzhi Liu, Tong Zhang
Automated Machine Learning (AutoML) is an important industrial solution for automatic discovery and deployment of the machine learning models.
no code implementations • NeurIPS 2020 • Yixing Xu, Chang Xu, Xinghao Chen, Wei zhang, Chunjing Xu, Yunhe Wang
A convolutional neural network (CNN) with the same architecture is simultaneously initialized and trained as a teacher network, features and weights of ANN and CNN will be transformed to a new space to eliminate the accuracy drop.
no code implementations • 16 Jul 2020 • Xinghao Chen, Yiman Zhang, Yunhe Wang
To identify the redundancy in segmentation networks, we present a multi-task channel pruning approach.
no code implementations • ECCV 2020 • Xinghao Chen, Yiman Zhang, Yunhe Wang, Han Shu, Chunjing Xu, Chang Xu
This paper proposes to learn a lightweight video style transfer network via knowledge distillation paradigm.
6 code implementations • CVPR 2021 • Zhaohui Yang, Yunhe Wang, Xinghao Chen, Jianyuan Guo, Wei zhang, Chao Xu, Chunjing Xu, DaCheng Tao, Chang Xu
To achieve an extremely fast NAS while preserving the high accuracy, we propose to identify the vital blocks and make them the priority in the architecture search.
1 code implementation • CVPR 2020 • Jianyuan Guo, Kai Han, Yunhe Wang, Chao Zhang, Zhaohui Yang, Han Wu, Xinghao Chen, Chang Xu
To this end, we propose a hierarchical trinity search framework to simultaneously discover efficient architectures for all components (i. e. backbone, neck, and head) of object detector in an end-to-end manner.
1 code implementation • CVPR 2020 • Zhaohui Yang, Yunhe Wang, Xinghao Chen, Boxin Shi, Chao Xu, Chunjing Xu, Qi Tian, Chang Xu
Architectures in the population that share parameters within one SuperNet in the latest generation will be tuned over the training dataset with a few epochs.
no code implementations • 26 Feb 2019 • Guijin Wang, Cairong Zhang, Xinghao Chen, Xiangyang Ji, Jing-Hao Xue, Hang Wang
To mitigate these limitations and promote further research on hand pose estimation from stereo images, we propose a new large-scale binocular hand pose dataset called THU-Bi-Hand, offering a new perspective for fingertip localization.
no code implementations • Sensors 2019 • Xinghao Chen, 1 Guijin Wang, Hengkai Guo, Cairong Zhang, Hang Wang, and Li Zhang
Dynamic hand gesture recognition has attracted increasing attention because of its importance for human–computer interaction.
no code implementations • IEEE Access 2018 • Xinghao Chen, Guijin Wang, Cairong Zhang, Tae-Kyun Kim, Xiangyang Ji
The semantic segmentation network assigns semantic labels for each point in the point set.
Ranked #7 on
Hand Pose Estimation
on MSRA Hands
no code implementations • 26 Apr 2018 • Yi Wei, Guijin Wang, Cairong Zhang, Hengkai Guo, Xinghao Chen, Huazhong Yang
Different from previous works, we propose a new framework, named Two-Stream Binocular Network (TSBnet) to detect fingertips from binocular images directly.
no code implementations • 2 Apr 2018 • Cairong Zhang, Guijin Wang, Hengkai Guo, Xinghao Chen, Fei Qiao, Huazhong Yang
In the reality of HMI, joints in fingers stretching out, especially corresponding fingertips, are much more important than other joints.
1 code implementation • CVPR 2018 • Shanxin Yuan, Guillermo Garcia-Hernando, Bjorn Stenger, Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee, Pavlo Molchanov, Jan Kautz, Sina Honari, Liuhao Ge, Junsong Yuan, Xinghao Chen, Guijin Wang, Fan Yang, Kai Akiyama, Yang Wu, Qingfu Wan, Meysam Madadi, Sergio Escalera, Shile Li, Dongheui Lee, Iason Oikonomidis, Antonis Argyros, Tae-Kyun Kim
Official Torch7 implementation of "V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map", CVPR 2018
Ranked #5 on
Hand Pose Estimation
on HANDS 2017
1 code implementation • 11 Aug 2017 • Xinghao Chen, Guijin Wang, Hengkai Guo, Cairong Zhang
The proposed method extracts regions from the feature maps of convolutional neural network under the guide of an initially estimated pose, generating more optimal and representative features for hand pose estimation.
Ranked #8 on
Hand Pose Estimation
on HANDS 2017
no code implementations • 10 Aug 2017 • Xinghao Chen, Hengkai Guo, Guijin Wang, Li Zhang
Dynamic hand gesture recognition has attracted increasing interests because of its importance for human computer interaction.
Ranked #9 on
Hand Gesture Recognition
on DHG-28
no code implementations • 23 Jul 2017 • Hengkai Guo, Guijin Wang, Xinghao Chen, Cairong Zhang
3D hand pose estimation from single depth image is an important and challenging problem for human-computer interaction.
Ranked #4 on
Pose Estimation
on ITOP top-view
no code implementations • 8 Feb 2017 • Hengkai Guo, Guijin Wang, Xinghao Chen, Cairong Zhang, Fei Qiao, Huazhong Yang
Hand pose estimation from monocular depth images is an important and challenging problem for human-computer interaction.
Ranked #11 on
Hand Pose Estimation
on MSRA Hands
no code implementations • 23 Dec 2016 • Hengkai Guo, Guijin Wang, Xinghao Chen
Accurate detection of fingertips in depth image is critical for human-computer interaction.