no code implementations • 15 Aug 2024 • Ruihang Li, Yixuan Wei, Miaosen Zhang, Nenghai Yu, Han Hu, Houwen Peng
Extensive experiments reveal that semantic diversity is a reliable indicator of dataset diversity, and ScalingFilter achieves an optimal balance between downstream performance and semantic diversity.
1 code implementation • 30 May 2024 • Bolin Ni, Jingcheng Hu, Yixuan Wei, Houwen Peng, Zheng Zhang, Gaofeng Meng, Han Hu
In this work, we present Xwin-LM, a comprehensive suite of alignment methodologies for large language models (LLMs).
1 code implementation • 7 Mar 2024 • Chen Li, Weiqi Wang, Jingcheng Hu, Yixuan Wei, Nanning Zheng, Han Hu, Zheng Zhang, Houwen Peng
This paper shows that the LLaMA-2 7B model with common pre-training already exhibits strong mathematical abilities, as evidenced by its impressive accuracy of 97. 7% and 72. 0% on the GSM8K and MATH benchmarks, respectively, when selecting the best response from 256 random generations.
1 code implementation • 27 Oct 2023 • Houwen Peng, Kan Wu, Yixuan Wei, Guoshuai Zhao, Yuxiang Yang, Ze Liu, Yifan Xiong, Ziyue Yang, Bolin Ni, Jingcheng Hu, Ruihang Li, Miaosen Zhang, Chen Li, Jia Ning, Ruizhe Wang, Zheng Zhang, Shuguang Liu, Joe Chau, Han Hu, Peng Cheng
In this paper, we explore FP8 low-bit data formats for efficient training of large language models (LLMs).
1 code implementation • ICCV 2023 • Kan Wu, Houwen Peng, Zhenghong Zhou, Bin Xiao, Mengchen Liu, Lu Yuan, Hong Xuan, Michael Valenzuela, Xi, Chen, Xinggang Wang, Hongyang Chao, Han Hu
In this paper, we propose a novel cross-modal distillation method, called TinyCLIP, for large-scale language-image pre-trained models.
1 code implementation • ICCV 2023 • Ben Kang, Xin Chen, Dong Wang, Houwen Peng, Huchuan Lu
The Bridge Module incorporates the high-level information of deep features into the shallow large-resolution features.
3 code implementations • CVPR 2023 • Xinyu Liu, Houwen Peng, Ningxin Zheng, Yuqing Yang, Han Hu, Yixuan Yuan
Comprehensive experiments demonstrate EfficientViT outperforms existing efficient models, striking a good trade-off between speed and accuracy.
1 code implementation • CVPR 2023 • Xin Chen, Ben Kang, Jiawen Zhu, Dong Wang, Houwen Peng, Huchuan Lu
In this paper, we introduce a new sequence-to-sequence learning framework for RGB-based and multi-modal object tracking.
Ranked #1 on Rgb-T Tracking on LasHeR
no code implementations • CVPR 2023 • Yixuan Wei, Yue Cao, Zheng Zhang, Houwen Peng, Zhuliang Yao, Zhenda Xie, Han Hu, Baining Guo
This paper presents a method that effectively combines two prevalent visual recognition methods, i. e., image classification and contrastive language-image pre-training, dubbed iCLIP.
1 code implementation • ICCV 2023 • Yifan Yang, Weiquan Huang, Yixuan Wei, Houwen Peng, Xinyang Jiang, Huiqiang Jiang, Fangyun Wei, Yin Wang, Han Hu, Lili Qiu, Yuqing Yang
To address this issue, we propose an attentive token removal approach for CLIP training, which retains tokens with a high semantic correlation to the text description.
2 code implementations • 4 Aug 2022 • Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling
Extensive experiments demonstrate that our approach is effective and can be generalized to different video recognition scenarios.
Ranked #9 on Zero-Shot Action Recognition on Kinetics
2 code implementations • 21 Jul 2022 • Kan Wu, Jinnian Zhang, Houwen Peng, Mengchen Liu, Bin Xiao, Jianlong Fu, Lu Yuan
It achieves a top-1 accuracy of 84. 8% on ImageNet-1k with only 21M parameters, being comparable to Swin-B pretrained on ImageNet-21k while using 4. 2 times fewer parameters.
Ranked #135 on Image Classification on ImageNet
3 code implementations • 9 Jun 2022 • Guocheng Qian, Yuchen Li, Houwen Peng, Jinjie Mai, Hasan Abed Al Kader Hammoud, Mohamed Elhoseiny, Bernard Ghanem
In this work, we revisit the classical PointNet++ through a systematic study of model training and scaling strategies, and offer two major contributions.
Ranked #3 on 3D Semantic Segmentation on OpenTrench3D
2 code implementations • CVPR 2022 • Jinnian Zhang, Houwen Peng, Kan Wu, Mengchen Liu, Bin Xiao, Jianlong Fu, Lu Yuan
The central idea of MiniViT is to multiplex the weights of consecutive transformer blocks.
Ranked #213 on Image Classification on ImageNet
2 code implementations • NeurIPS 2021 • Minghao Chen, Kan Wu, Bolin Ni, Houwen Peng, Bei Liu, Jianlong Fu, Hongyang Chao, Haibin Ling
Vision Transformer has shown great visual representation power in substantial vision tasks such as recognition and detection, and thus been attracting fast-growing efforts on manually designing more effective architectures.
1 code implementation • ICCV 2021 • Jilai Zheng, Chao Ma, Houwen Peng, Xiaokang Yang
In this paper, we propose to learn an Unsupervised Single Object Tracker (USOT) from scratch.
1 code implementation • ICCV 2021 • Kan Wu, Houwen Peng, Minghao Chen, Jianlong Fu, Hongyang Chao
We then propose new relative position encoding methods dedicated to 2D images, called image RPE (iRPE).
Ranked #152 on Object Detection on COCO minival
2 code implementations • ICCV 2021 • Minghao Chen, Houwen Peng, Jianlong Fu, Haibin Ling
Specifically, the performance of these subnets with weights inherited from the supernet is comparable to those retrained from scratch.
Ranked #1 on Fine-Grained Image Classification on Oxford 102 Flowers (Top 1 Accuracy metric)
no code implementations • NeurIPS 2021 • Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo
To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment.
no code implementations • NeurIPS 2021 • Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo
To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment.
1 code implementation • CVPR 2021 • Bin Yan, Houwen Peng, Kan Wu, Dong Wang, Jianlong Fu, Huchuan Lu
Object tracking has achieved significant progress over the past few years.
1 code implementation • CVPR 2021 • Minghao Chen, Houwen Peng, Jianlong Fu, Haibin Ling
In this paper, we propose a one-shot neural ensemble architecture search (NEAS) solution that addresses the two challenges.
1 code implementation • ICCV 2021 • Bin Yan, Houwen Peng, Jianlong Fu, Dong Wang, Huchuan Lu
In this paper, we present a new tracking architecture with an encoder-decoder transformer as the key component.
Ranked #21 on Visual Object Tracking on TrackingNet
1 code implementation • 4 Dec 2020 • Songyang Zhang, Houwen Peng, Jianlong Fu, Yijuan Lu, Jiebo Luo
It is a challenging problem because a target moment may take place in the context of other temporal moments in the untrimmed video.
2 code implementations • NeurIPS 2020 • Houwen Peng, Hao Du, Hongyuan Yu, Qi Li, Jing Liao, Jianlong Fu
The experiments on ImageNet verify such path distillation method can improve the convergence ratio and performance of the hypernetwork, as well as boosting the training of subnetworks.
1 code implementation • 22 Aug 2020 • Le Yang, Houwen Peng, Dingwen Zhang, Jianlong Fu, Junwei Han
To address this problem, this paper proposes a novel anchor-free action localization module that assists action localization by temporal points.
1 code implementation • 6 Aug 2020 • Zhipeng Zhang, Bing Li, Weiming Hu, Houwen Peng
We first build a look-up-table (LUT) with the ground-truth mask in the starting frame, and then retrieves the LUT to obtain an attention map for spatial constraints.
4 code implementations • ECCV 2020 • Zhipeng Zhang, Houwen Peng, Jianlong Fu, Bing Li, Weiming Hu
In this paper, we propose a novel object-aware anchor-free network to address this issue.
Ranked #2 on Visual Object Tracking on VOT2019
3 code implementations • 18 Jun 2020 • Hongyuan Yu, Houwen Peng, Yan Huang, Jianlong Fu, Hao Du, Liang Wang, Haibin Ling
First, the search network generates an initial architecture for evaluation, and the weights of the evaluation network are optimized.
Ranked #18 on Neural Architecture Search on NAS-Bench-201, CIFAR-10
1 code implementation • CVPR 2020 • Yizhuo Zhang, Zhirong Wu, Houwen Peng, Stephen Lin
Semi-supervised video object segmentation aims to separate a target object from a video sequence, given the mask in the first frame.
3 code implementations • 8 Dec 2019 • Songyang Zhang, Houwen Peng, Jianlong Fu, Jiebo Luo
We address the problem of retrieving a specific moment from an untrimmed video by a query sentence.
2 code implementations • 8 Dec 2019 • Songyang Zhang, Houwen Peng, Le Yang, Jianlong Fu, Jiebo Luo
In this report, we introduce the Winner method for HACS Temporal Action Localization Challenge 2019.
5 code implementations • CVPR 2019 • Zhipeng Zhang, Houwen Peng
Siamese networks have drawn great attention in visual tracking because of their balanced accuracy and speed.
Ranked #2 on Visual Object Tracking on VOT2017
no code implementations • CVPR 2013 • Bing Li, Weihua Xiong, Weiming Hu, Houwen Peng
In this paper, we propose a novel bilayer sparse coding model for illumination estimation that considers image similarity in terms of both low level color distribution and high level image scene content simultaneously.