1 code implementation • ICML 2020 • Hai Phan, My T. Thai, Han Hu, Ruoming Jin, Tong Sun, Dejing Dou
In this paper, we aim to develop a scalable algorithm to preserve differential privacy (DP) in adversarial learning for deep neural networks (DNNs), with certified robustness to adversarial examples.
no code implementations • CCL 2020 • Han Hu, Pengyuan Liu
关系分类作为构建结构化知识的重要一环, 在自然语言处理领域备受关注。但在很多应用领域中(医疗、金融领域), 收集充足的用于训练关系分类模型的数据是十分困难的。近年来, 仅需要少量训练样本的小样本学习研究逐渐新兴于各大领域。本文对近期小样本关系分类模型与方法进行了系统的综述。根据度量方法的不同, 将现有方法分为原型式和分布式两大类。根据是否利用额外信息, 将模型分为预训练和非预训练两大类。此外, 除了常规设定下的小样本学习, 本文还梳理了跨领域和稀缺资源场景下的小样本学习, 并探讨了目前小样本关系分类方法的局限性, 分析了跨领域小样本 学习面临的技术挑战。最后, 展望了小样本关系分类未来的发展方向。
no code implementations • 9 Sep 2024 • Shuai Wang, Yibing Zhan, Yong Luo, Han Hu, Wei Yu, Yonggang Wen, DaCheng Tao
This mechanism assigns different weights to different categories of data according to the gradient of the output score, and uses knowledge distillation (KD) to reduce the mutual interference between the outputs of old and new tasks.
no code implementations • 26 Aug 2024 • Yilong Chen, Chao Hu, Zixiang Ren, Han Hu, Jie Xu, Lexi Xu, Lei Liu, Shuguang Cui
Furthermore, we consider the beam scanning for sensing, in which the joint beams scan in different directions over time to sense potential targets.
1 code implementation • 19 Aug 2024 • Anke Tang, Li Shen, Yong Luo, Shuai Xie, Han Hu, Lefei Zhang, Bo Du, DaCheng Tao
Deep model training on extensive datasets is increasingly becoming cost-prohibitive, prompting the widespread adoption of deep model fusion techniques to leverage knowledge from pre-existing models.
no code implementations • 19 Aug 2024 • Xingrun Yan, Shiyuan Zuo, Rongfei Fan, Han Hu, Li Shen, Puning Zhao, Yong Luo
In a real federated learning (FL) system, communication overhead for passing model parameters between the clients and the parameter server (PS) is often a bottleneck.
no code implementations • 18 Aug 2024 • Shiyuan Zuo, Xingrun Yan, Rongfei Fan, Li Shen, Puning Zhao, Jie Xu, Han Hu
In cases where the loss function is strongly convex, the zero optimality gap achieving rate can be improved to be linear.
no code implementations • 15 Aug 2024 • Ruihang Li, Yixuan Wei, Miaosen Zhang, Nenghai Yu, Han Hu, Houwen Peng
Extensive experiments reveal that semantic diversity is a reliable indicator of dataset diversity, and ScalingFilter achieves an optimal balance between downstream performance and semantic diversity.
2 code implementations • 22 Jun 2024 • Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu, Zijian Wang, David Lo, Binyuan Hui, Niklas Muennighoff, Daniel Fried, Xiaoning Du, Harm de Vries, Leandro von Werra
To assess how well LLMs can solve challenging and practical programming tasks, we introduce Bench, a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1, 140 fine-grained programming tasks.
Ranked #1 on Code Generation on BigCodeBench-Instruct
1 code implementation • 14 Jun 2024 • Anke Tang, Li Shen, Yong Luo, Shiwei Liu, Han Hu, Bo Du
Once the routers are learned and a preference vector is set, the MoE module can be unloaded, thus no additional computational cost is introduced during inference.
1 code implementation • 5 Jun 2024 • Anke Tang, Li Shen, Yong Luo, Han Hu, Bo Du, DaCheng Tao
These techniques range from model ensemble methods, which combine the predictions to improve the overall performance, to model merging, which integrates different models into a single one, and model mixing methods, which upscale or recombine the components of the original models.
1 code implementation • 30 May 2024 • Bolin Ni, Jingcheng Hu, Yixuan Wei, Houwen Peng, Zheng Zhang, Gaofeng Meng, Han Hu
In this work, we present Xwin-LM, a comprehensive suite of alignment methodologies for large language models (LLMs).
no code implementations • 24 Apr 2024 • Xuming An, Dui Wang, Li Shen, Yong Luo, Han Hu, Bo Du, Yonggang Wen, DaCheng Tao
Specifically, FedALC estimates the label correlations in the class embedding learning for different label pairs and utilizes it to improve the model training.
no code implementations • 20 Mar 2024 • Shiyuan Zuo, Xingrun Yan, Rongfei Fan, Han Hu, Hangguan Shan, Tony Q. S. Quek
This paper deals with federated learning (FL) in the presence of malicious Byzantine attacks and data heterogeneity.
1 code implementation • 7 Mar 2024 • Chen Li, Weiqi Wang, Jingcheng Hu, Yixuan Wei, Nanning Zheng, Han Hu, Zheng Zhang, Houwen Peng
This paper shows that the LLaMA-2 7B model with common pre-training already exhibits strong mathematical abilities, as evidenced by its impressive accuracy of 97. 7% and 72. 0% on the GSM8K and MATH benchmarks, respectively, when selecting the best response from 256 random generations.
2 code implementations • 29 Feb 2024 • Anton Lozhkov, Raymond Li, Loubna Ben allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade, Wenhao Yu, Lucas Krauß, Naman jain, Yixuan Su, Xuanli He, Manan Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xiangru Tang, Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao Mou, Mayank Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze, Olivier Dehaene, Nicolas Patry, Canwen Xu, Julian McAuley, Han Hu, Torsten Scholak, Sebastien Paquet, Jennifer Robinson, Carolyn Jane Anderson, Nicolas Chapados, Mostofa Patwary, Nima Tajbakhsh, Yacine Jernite, Carlos Muñoz Ferrandis, Lingming Zhang, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, Harm de Vries
Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size.
Ranked #26 on Code Generation on MBPP
1 code implementation • 7 Feb 2024 • Jianyuan Guo, Zhiwei Hao, Chengcheng Wang, Yehui Tang, Han Wu, Han Hu, Kai Han, Chang Xu
Training general-purpose vision models on purely sequential visual data, eschewing linguistic inputs, has heralded a new frontier in visual understanding.
no code implementations • 12 Jan 2024 • Wenbin Wang, Liang Ding, Li Shen, Yong Luo, Han Hu, DaCheng Tao
Sentiment analysis is rapidly advancing by utilizing various data modalities (e. g., text, image).
1 code implementation • CVPR 2024 • Jiayi Guan, Li Shen, Ao Zhou, Lusong Li, Han Hu, Xiaodong He, Guang Chen, Changjun Jiang
Multi-constraint offline reinforcement learning (RL) promises to learn policies that satisfy both cumulative and state-wise costs from offline datasets.
no code implementations • 23 Dec 2023 • Yifeng Lyu, Han Hu, Rongfei Fan, Zhi Liu, Jianping An, Shiwen Mao
To address these challenges, we study packet routing with ground stations and satellites working jointly to transmit packets, while prioritizing fast communication and meeting energy efficiency and packet loss requirements.
no code implementations • 22 Dec 2023 • Hanxiang He, Han Hu, Xintao Huan, Heng Liu, Jianping An, Shiwen Mao
Deep learning has significantly advanced wireless sensing technology by leveraging substantial amounts of high-quality training data.
1 code implementation • 11 Dec 2023 • Anke Tang, Li Shen, Yong Luo, Liang Ding, Han Hu, Bo Du, DaCheng Tao
At the upper level, we focus on learning a shared Concrete mask to identify the subspace, while at the inner level, model merging is performed to maximize the performance of the merged model.
1 code implementation • CVPR 2024 • Xiaoke Huang, JianFeng Wang, Yansong Tang, Zheng Zhang, Han Hu, Jiwen Lu, Lijuan Wang, Zicheng Liu
We propose a method to efficiently equip the Segment Anything Model (SAM) with the ability to generate regional captions.
no code implementations • 30 Nov 2023 • Zhen Xing, Qi Dai, Zihao Zhang, HUI ZHANG, Han Hu, Zuxuan Wu, Yu-Gang Jiang
Our model can edit and translate the desired results within seconds based on user instructions.
1 code implementation • CVPR 2024 • Shuyuan Tu, Qi Dai, Zhi-Qi Cheng, Han Hu, Xintong Han, Zuxuan Wu, Yu-Gang Jiang
This mechanism enables the editing branch to query the key and value from the reconstruction branch in a decoupled manner, making the editing branch retain the original background and protagonist appearance.
no code implementations • 28 Nov 2023 • Jie Li, Zhixin Li, Zhi Liu, Pengyuan Zhou, Richang Hong, Qiyue Li, Han Hu
To our knowledge, this is the first comprehensive study of viewport prediction in volumetric video streaming.
no code implementations • 26 Nov 2023 • Tianyu He, Junliang Guo, Runyi Yu, Yuchi Wang, Jialiang Zhu, Kaikai An, Leyi Li, Xu Tan, Chunyu Wang, Han Hu, HsiangTao Wu, Sheng Zhao, Jiang Bian
Zero-shot talking avatar generation aims at synthesizing natural talking videos from speech and a single portrait image.
1 code implementation • CVPR 2024 • Ziwei Liao, Jialiang Zhu, Chunyu Wang, Han Hu, Steven L. Waslander
In this work, we aim to improve the 3D reasoning ability of Transformers in multi-view 3D human pose estimation.
3D Human Pose Estimation Multi-view 3D Human Pose Estimation
1 code implementation • NeurIPS 2023 • Zhiwei Hao, Jianyuan Guo, Kai Han, Yehui Tang, Han Hu, Yunhe Wang, Chang Xu
To tackle the challenge in distilling heterogeneous models, we propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures.
1 code implementation • 27 Oct 2023 • Houwen Peng, Kan Wu, Yixuan Wei, Guoshuai Zhao, Yuxiang Yang, Ze Liu, Yifan Xiong, Ziyue Yang, Bolin Ni, Jingcheng Hu, Ruihang Li, Miaosen Zhang, Chen Li, Jia Ning, Ruizhe Wang, Zheng Zhang, Shuguang Liu, Joe Chau, Han Hu, Peng Cheng
In this paper, we explore FP8 low-bit data formats for efficient training of large language models (LLMs).
1 code implementation • 16 Oct 2023 • Zhen Xing, Qijun Feng, Haoran Chen, Qi Dai, Han Hu, Hang Xu, Zuxuan Wu, Yu-Gang Jiang
However, existing surveys mainly focus on diffusion models in the context of image generation, with few up-to-date reviews on their application in the video domain.
1 code implementation • 12 Oct 2023 • Hongling Zheng, Li Shen, Anke Tang, Yong Luo, Han Hu, Bo Du, DaCheng Tao
LFM focuses on the research, modification, and design of FM based on the model interface, so as to better understand the model structure and weights (in a black box environment), and to generalize the model to downstream tasks.
2 code implementations • NeurIPS 2023 • Han Hu, Haolan Zhan, Yujin Huang, Di Liu
There are currently several publicly accessible GUI page datasets for phones, but none for pairwise GUIs between phones and tablets.
1 code implementation • 7 Oct 2023 • Anke Tang, Li Shen, Yong Luo, Yibing Zhan, Han Hu, Bo Du, Yixin Chen, DaCheng Tao
We demonstrate that our partial linearization technique enables a more effective fusion of multiple tasks into a single model, outperforming standard adapter tuning and task arithmetic alone.
1 code implementation • 27 Sep 2023 • Weishi Li, Yong Peng, Miao Zhang, Liang Ding, Han Hu, Li Shen
Specifically, we categorize existing deep model fusion methods as four-fold: (1) "Mode connectivity", which connects the solutions in weight space via a path of non-increasing loss, in order to obtain better initialization for model fusion; (2) "Alignment" matches units between neural networks to create better conditions for fusion; (3) "Weight average", a classical model fusion method, averages the weights of multiple models to obtain more accurate results closer to the optimal solution; (4) "Ensemble learning" combines the outputs of diverse models, which is a foundational technique for improving the accuracy and robustness of the final model.
1 code implementation • ICCV 2023 • Kan Wu, Houwen Peng, Zhenghong Zhou, Bin Xiao, Mengchen Liu, Lu Yuan, Hong Xuan, Michael Valenzuela, Xi, Chen, Xinggang Wang, Hongyang Chao, Han Hu
In this paper, we propose a novel cross-modal distillation method, called TinyCLIP, for large-scale language-image pre-trained models.
no code implementations • 12 Sep 2023 • Jindong Gu, Fangyun Wei, Philip Torr, Han Hu
In this work, we first taxonomize the stochastic defense strategies against QBBA.
no code implementations • 10 Sep 2023 • Guanyu Xu, Zhiwei Hao, Yong Luo, Han Hu, Jianping An, Shiwen Mao
Our objective is to achieve fast and energy-efficient collaborative inference while maintaining comparable accuracy compared with large ViTs.
1 code implementation • CVPR 2024 • Zigang Geng, Binxin Yang, Tiankai Hang, Chen Li, Shuyang Gu, Ting Zhang, Jianmin Bao, Zheng Zhang, Han Hu, Dong Chen, Baining Guo
We present InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions.
1 code implementation • ICCV 2023 • Xin Lai, Yuhui Yuan, Ruihang Chu, Yukang Chen, Han Hu, Jiaya Jia
Therefore, we abandon the mask attention design and resort to an auxiliary center regression task instead.
no code implementations • 24 Aug 2023 • Mengya Han, Heliang Zheng, Chaoyue Wang, Yong Luo, Han Hu, Jing Zhang, Yonggang Wen
In this work, we address the task of few-shot part segmentation, which aims to segment the different parts of an unseen object using very few labeled examples.
no code implementations • 21 Aug 2023 • Shiyuan Zuo, Rongfei Fan, Han Hu, Ning Zhang, Shimin Gong
In this paper, we propose a robust aggregation method for federated learning (FL) that can effectively tackle malicious Byzantine attacks.
no code implementations • CVPR 2024 • Zhen Xing, Qi Dai, Han Hu, Zuxuan Wu, Yu-Gang Jiang
In this work, we propose a Simple Diffusion Adapter (SimDA) that fine-tunes only 24M out of 1. 1B parameters of a strong T2I model, adapting it to video generation in a parameter-efficient way.
no code implementations • 17 Aug 2023 • Rongfei Fan, Xuming An, Shiyuan Zuo, Han Hu
In case of smooth and strongly convex loss function, we prove our proposed method can achieve minimal training loss at linear rate with any small positive tolerance.
1 code implementation • 17 Aug 2023 • Xuming An, Rongfei Fan, Shiyuan Zuo, Han Hu, Hai Jiang, Ning Zhang
For parameter aggregating in FL, over-the-air computation is a spectrum-efficient solution, which allows all mobile devices to transmit their parameter-mapped signals concurrently to a BS.
no code implementations • 11 Aug 2023 • Rui Xu, Yong Luo, Han Hu, Bo Du, Jialie Shen, Yonggang Wen
Weakly supervised object localization (WSOL) is one of the most popular and challenging tasks in computer vision.
1 code implementation • 8 Aug 2023 • Yichao Shen, Zigang Geng, Yuhui Yuan, Yutong Lin, Ze Liu, Chunyu Wang, Han Hu, Nanning Zheng, Baining Guo
We introduce a highly performant 3D object detector for point clouds using the DETR framework.
Ranked #2 on 3D Object Detection on ScanNetV2
1 code implementation • 7 Aug 2023 • Zhuang Qi, Lei Meng, Zitan Chen, Han Hu, Hui Lin, Xiangxu Meng
To address this issue, this paper presents a cross-silo prototypical calibration method (FedCSPC), which takes additional prototype information from the clients to learn a unified feature space on the server side.
1 code implementation • 3 Aug 2023 • Yutong Lin, Yuhui Yuan, Zheng Zhang, Chen Li, Nanning Zheng, Han Hu
This paper presents an improved DETR detector that maintains a "plain" nature: using a single-scale feature map and global cross-attention calculations without specific locality constraints, in contrast to previous leading DETR-based detectors that reintroduce architectural inductive biases of multi-scale and locality into the decoder.
1 code implementation • 1 Aug 2023 • Guanyu Xu, Jiawei Hao, Li Shen, Han Hu, Yong Luo, Hui Lin, Jialie Shen
Recently, the efficient deployment and acceleration of powerful vision transformers (ViTs) on resource-limited edge devices for providing multimedia services have become attractive tasks.
2 code implementations • 19 Jun 2023 • Ting Zhe, Jing Zhang, YongQian Li, Yong Luo, Han Hu, DaCheng Tao
To fill this gap, we introduce the FHA-Kitchens (Fine-Grained Hand Actions in Kitchen Scenes) dataset, providing both coarse- and fine-grained hand action categories along with localization annotations.
no code implementations • 7 Jun 2023 • Libin Wang, Han Hu, Qisen Shang, Bo Xu, Qing Zhu
The lack of fa\c{c}ade structures in photogrammetric mesh models renders them inadequate for meeting the demands of intricate applications.
1 code implementation • NeurIPS 2023 • Yukang Yang, Dongnan Gui, Yuhui Yuan, Weicong Liang, Haisong Ding, Han Hu, Kai Chen
We evaluate the effectiveness of our approach by measuring OCR-based metrics, CLIP score, and FID of the generated visual text.
1 code implementation • 25 May 2023 • Zhiwei Hao, Jianyuan Guo, Kai Han, Han Hu, Chang Xu, Yunhe Wang
The tremendous success of large models trained on extensive datasets demonstrates that scale is a key ingredient in achieving superior results.
1 code implementation • 23 May 2023 • Anke Tang, Yong Luo, Han Hu, Fengxiang He, Kehua Su, Bo Du, Yixin Chen, DaCheng Tao
This paper studies multiparty learning, aiming to learn a model using the private data of different participants.
3 code implementations • CVPR 2023 • Xinyu Liu, Houwen Peng, Ningxin Zheng, Yuqing Yang, Han Hu, Yixuan Yuan
Comprehensive experiments demonstrate EfficientViT outperforms existing efficient models, striking a good trade-off between speed and accuracy.
1 code implementation • ICCV 2023 • Shuyuan Tu, Qi Dai, Zuxuan Wu, Zhi-Qi Cheng, Han Hu, Yu-Gang Jiang
While modeling temporal information within straight through tube is widely adopted in literature, we find that simple frame alignment already provides enough essence without temporal attention.
Ranked #17 on Action Classification on Kinetics-400
1 code implementation • 14 Apr 2023 • Cheng Liao, Han Hu, Xuekun Yuan, Haifeng Li, Chao Liu, Chunyang Liu, Gui Fu, Yulin Ding, Qing Zhu
This contrastive learning strategy allowed us to inject the semantics of buildings into a pipeline for the detection of changes, which is achieved by increasing the distinguishability of features of buildings from those of non-buildings.
no code implementations • 12 Apr 2023 • Haojia Yu, Han Hu, Bo Xu, Qisen Shang, Zhendong Wang, Qing Zhu
Most urban applications necessitate building footprints in the form of concise vector graphics with sharp boundaries rather than pixel-wise raster images.
no code implementations • 30 Mar 2023 • Qisen Shang, Han Hu, Haojia Yu, Bo Xu, Libin Wang, Qing Zhu
Experimental results on publicly available fa\c{c}ade image and 3D model datasets demonstrate that our method yields superior results and effectively addresses issues associated with flawed textures.
1 code implementation • CVPR 2023 • Zigang Geng, Chunyu Wang, Yixuan Wei, Ze Liu, Houqiang Li, Han Hu
Human pose is typically represented by a coordinate vector of body joints or their heatmap embeddings.
Ranked #1 on Pose Estimation on MPII Human Pose
2 code implementations • 18 Mar 2023 • Firas Al-Hindawi, Md Mahfuzur Rahman Siddiquee, Teresa Wu, Han Hu, Ying Sun
Cross-domain classification frameworks were developed to handle this data domain shift problem by utilizing unsupervised image-to-image translation models to translate an input image from the unlabeled domain to the labeled domain.
2 code implementations • ICCV 2023 • Tiankai Hang, Shuyang Gu, Chen Li, Jianmin Bao, Dong Chen, Han Hu, Xin Geng, Baining Guo
Denoising diffusion models have been a mainstream approach for image generation, however, training these models often suffers from slow convergence.
Ranked #3 on Image Generation on ImageNet 256x256
1 code implementation • 15 Mar 2023 • Sucheng Ren, Fangyun Wei, Samuel Albanie, Zheng Zhang, Han Hu
Deep supervision, which involves extra supervisions to the intermediate features of a neural network, was widely used in image classification in the early deep learning era since it significantly reduces the training difficulty and eases the optimization like avoiding gradient vanish over the vanilla training.
1 code implementation • 7 Mar 2023 • Rui Xu, Zhi Liu, Yong Luo, Han Hu, Li Shen, Bo Du, Kaiming Kuang, Jiancheng Yang
To address this issue, we propose a slice grouped domain attention (SGDA) module to enhance the generalization capability of the pulmonary nodule detection networks.
no code implementations • 24 Feb 2023 • Guanghao Li, Li Shen, Yan Sun, Yue Hu, Han Hu, DaCheng Tao
Federated learning (FL) enables multiple clients to train a machine learning model collaboratively without exchanging their local data.
3 code implementations • CVPR 2023 • Mengde Xu, Zheng Zhang, Fangyun Wei, Han Hu, Xiang Bai
A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias which is applied in the CLIP model to recognize the class of masks.
no code implementations • 15 Feb 2023 • Dui Wang, Li Shen, Yong Luo, Han Hu, Kehua Su, Yonggang Wen, DaCheng Tao
In particular, we adopt the ``one-vs-all'' training strategy in each client to alleviate the unfair competition between classes by constructing a personalized binary classification problem for each class.
1 code implementation • 8 Feb 2023 • Yujin Huang, Terry Yue Zhuo, Qiongkai Xu, Han Hu, Xingliang Yuan, Chunyang Chen
In this work, we propose Training-Free Lexical Backdoor Attack (TFLexAttack) as the first training-free backdoor attack on language models.
1 code implementation • ICCV 2023 • Jia Ning, Chen Li, Zheng Zhang, Zigang Geng, Qi Dai, Kun He, Han Hu
With these new techniques and other designs, we show that the proposed general-purpose task-solver can perform both instance segmentation and depth estimation well.
Ranked #19 on Monocular Depth Estimation on NYU-Depth V2
2 code implementations • CVPR 2023 • Sucheng Ren, Fangyun Wei, Zheng Zhang, Han Hu
Our TinyMIM model of tiny size achieves 79. 6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget.
1 code implementation • ICCV 2023 • Yixuan Wei, Han Hu, Zhenda Xie, Ze Liu, Zheng Zhang, Yue Cao, Jianmin Bao, Dong Chen, Baining Guo
Experiments suggest that the feature map distillation approach significantly boosts the fine-tuning performance of CLIP models on several typical downstream vision tasks.
no code implementations • CVPR 2023 • Yixuan Wei, Yue Cao, Zheng Zhang, Houwen Peng, Zhuliang Yao, Zhenda Xie, Han Hu, Baining Guo
This paper presents a method that effectively combines two prevalent visual recognition methods, i. e., image classification and contrastive language-image pre-training, dubbed iCLIP.
1 code implementation • ICCV 2023 • Yutong Lin, Yuhui Yuan, Zheng Zhang, Chen Li, Nanning Zheng, Han Hu
This paper presents an improved DETR detector that maintains a "plain" nature: using a single-scale feature map and global cross-attention calculations without specific locality constraints, in contrast to previous leading DETR-based detectors that reintroduce architectural inductive biases of multi-scale and locality into the decoder.
no code implementations • 18 Dec 2022 • Firas Al-Hindawi, Tejaswi Soori, Han Hu, Md Mahfuzur Rahman Siddiquee, Hyunsoo Yoon, Teresa Wu, Ying Sun
To deal with datasets from new domains a model needs to be trained from scratch.
1 code implementation • ICCV 2023 • Yifan Yang, Weiquan Huang, Yixuan Wei, Houwen Peng, Xinyang Jiang, Huiqiang Jiang, Fangyun Wei, Yin Wang, Han Hu, Lili Qiu, Yuqing Yang
To address this issue, we propose an attentive token removal approach for CLIP training, which retains tokens with a high semantic correlation to the text description.
1 code implementation • CVPR 2023 • Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yu Qiao, Yu-Gang Jiang
We introduce, ResFormer, a framework that is built upon the seminal idea of multi-resolution training for improved performance on a wide spectrum of, mostly unseen, testing resolutions.
1 code implementation • CVPR 2023 • Zhen Xing, Qi Dai, Han Hu, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang
In this paper, we investigate the use of transformer models under the SSL setting for action recognition.
no code implementations • 21 Nov 2022 • Zhihang Zhong, Mingxi Cheng, Zhirong Wu, Yuhui Yuan, Yinqiang Zheng, Ji Li, Han Hu, Stephen Lin, Yoichi Sato, Imari Sato
Image cropping has progressed tremendously under the data-driven paradigm.
1 code implementation • 21 Nov 2022 • Zixin Zhu, Yixuan Wei, JianFeng Wang, Zhe Gan, Zheng Zhang, Le Wang, Gang Hua, Lijuan Wang, Zicheng Liu, Han Hu
The image captioning task is typically realized by an auto-regressive method that decodes the text tokens one by one.
no code implementations • 3 Nov 2022 • Yutong Lin, Ze Liu, Zheng Zhang, Han Hu, Nanning Zheng, Stephen Lin, Yue Cao
In this paper, we present a study of frozen pretrained models when applied to diverse and representative computer vision tasks, including object detection, semantic segmentation and video action recognition.
Ranked #3 on Action Recognition In Videos on Kinetics-400
4 code implementations • 3 Oct 2022 • Weicong Liang, Yuhui Yuan, Henghui Ding, Xiao Luo, WeiHong Lin, Ding Jia, Zheng Zhang, Chao Zhang, Han Hu
Vision transformers have recently achieved competitive results across various vision tasks but still suffer from heavy computation costs when processing a large number of tokens.
no code implementations • 20 Sep 2022 • Han Hu, Xingwu Zhu, Fuhui Zhou, Wei Wu, Rose Qingyang Hu, Hongbo Zhu
To effectively exploit the benefits enabled by semantic communication, in this paper, we propose a one-to-many semantic communication system.
no code implementations • 7 Sep 2022 • Mengya Han, Yibing Zhan, Yong Luo, Bo Du, Han Hu, Yonggang Wen, DaCheng Tao
To address the above issues, we propose a novel metric-based meta-learning framework termed instance-adaptive class representation learning network (ICRL-Net) for few-shot visual recognition.
1 code implementation • 27 Jul 2022 • Mengya Han, Heliang Zheng, Chaoyue Wang, Yong Luo, Han Hu, Bo Du
Overall, this work is an attempt to explore the internal relevance between generation tasks and perception tasks by prompt designing.
8 code implementations • CVPR 2023 • Ding Jia, Yuhui Yuan, Haodi He, Xiaopei Wu, Haojun Yu, WeiHong Lin, Lei Sun, Chao Zhang, Han Hu
One-to-one set matching is a key design for DETR to establish its end-to-end capability, so that object detection does not require a hand-crafted NMS (non-maximum suppression) to remove duplicate detections.
1 code implementation • 26 Jul 2022 • Phung Lai, Han Hu, NhatHai Phan, Ruoming Jin, My T. Thai, An M. Chen
In this paper, we show that the process of continually learning new tasks and memorizing previous tasks introduces unknown privacy risks and challenges to bound the privacy loss.
1 code implementation • CVPR 2023 • Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Yixuan Wei, Qi Dai, Han Hu
Our study reveals that: (i) Masked image modeling is also demanding on larger data.
2 code implementations • 7 Jun 2022 • Changho Hwang, Wei Cui, Yifan Xiong, Ziyue Yang, Ze Liu, Han Hu, Zilong Wang, Rafael Salas, Jithin Jose, Prabhat Ram, Joe Chau, Peng Cheng, Fan Yang, Mao Yang, Yongqiang Xiong
On efficiency, Flex accelerates SwinV2-MoE, achieving up to 1. 55x and 2. 11x speedup in training and inference over Fairseq, respectively.
1 code implementation • 27 May 2022 • Yixuan Wei, Han Hu, Zhenda Xie, Zheng Zhang, Yue Cao, Jianmin Bao, Dong Chen, Baining Guo
These properties, which we aggregately refer to as optimization friendliness, are identified and analyzed by a set of attention- and optimization-related diagnosis tools.
Ranked #2 on Instance Segmentation on COCO test-dev (using extra training data)
1 code implementation • CVPR 2023 • Zhenda Xie, Zigang Geng, Jingcheng Hu, Zheng Zhang, Han Hu, Yue Cao
In this paper, we compare MIM with the long-dominant supervised pre-trained models from two perspectives, the visualizations and the experiments, to uncover their key representational differences.
Ranked #3 on Depth Estimation on NYU-Depth V2
1 code implementation • 24 May 2022 • Zhiwei Hao, Yong Luo, Zhi Wang, Han Hu, Jianping An
To tackle this challenge, we propose a framework termed collaborative data-free knowledge distillation via multi-level feature sharing (CDFKD-MFS), which consists of a multi-header student module, an asymmetric adversarial data-free KD module, and an attention-based aggregation module.
1 code implementation • 24 May 2022 • Zhiwei Hao, Guanyu Xu, Yong Luo, Han Hu, Jianping An, Shiwen Mao
In this paper, we study the multi-agent collaborative inference scenario, where a single edge server coordinates the inference of multiple UEs.
no code implementations • 26 Apr 2022 • Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yu-Gang Jiang
With Vision Transformers (ViTs) making great advances in a variety of computer vision tasks, recent literature have proposed various variants of vanilla ViTs to achieve better efficiency and efficacy.
no code implementations • 22 Apr 2022 • Yixuan Wei, Yue Cao, Zheng Zhang, Zhuliang Yao, Zhenda Xie, Han Hu, Baining Guo
Second, we convert the image classification problem from learning parametric category classifier weights to learning a text encoder as a meta network to generate category classifier weights.
no code implementations • 10 Apr 2022 • Chao Li, Jia Ning, Han Hu, Kun He
Differentiable architecture search (DARTS) has attracted much attention due to its simplicity and significant improvement in efficiency.
5 code implementations • 5 Apr 2022 • Jiequan Cui, Yuhui Yuan, Zhisheng Zhong, Zhuotao Tian, Han Hu, Stephen Lin, Jiaya Jia
In this paper, we study the problem of class imbalance in semantic segmentation.
Ranked #21 on Semantic Segmentation on ADE20K
2 code implementations • 8 Mar 2022 • Haodi He, Yuhui Yuan, Xiangyu Yue, Han Hu
Given an input image or video, our framework first conducts multi-label classification over the complete label, then sorts the complete label and selects a small subset according to their class confidence scores.
no code implementations • 8 Feb 2022 • Han Hu, Weiwei Song, Qun Wang, Rose Qingyang Hu, Hongbo Zhu
Theoretical analysis proves that the proposed algorithm can achieve a $[O(1/V), O(V)]$ tradeoff between EE and service delay.
no code implementations • 22 Jan 2022 • Han Hu, Xinrong Liang, Yulin Ding, Qisen Shang, Bo Xu, Xuming Ge, Min Chen, Ruofei Zhong, Qing Zhu
Unfortunately, the large amount of interactive sample labeling efforts has dramatically hindered the application of deep learning methods, especially for 3D modeling tasks, which require heterogeneous samples.
2 code implementations • 29 Dec 2021 • Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Han Hu, Xiang Bai
However, semantic segmentation and the CLIP model perform on different visual granularity, that semantic segmentation processes on pixels while CLIP performs on images.
20 code implementations • CVPR 2022 • Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo
Three main techniques are proposed: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) A log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.
Ranked #4 on Image Classification on ImageNet V2 (using extra training data)
5 code implementations • CVPR 2022 • Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, Han Hu
We also leverage this approach to facilitate the training of a 3B model (SwinV2-G), that by $40\times$ less data than that in previous practice, we achieve the state-of-the-art on four representative vision benchmarks.
Representation Learning Self-Supervised Image Classification +1
no code implementations • 17 Nov 2021 • Xiaopeng Jiang, Han Hu, Vijaya Datta Mayyuri, An Chen, Devu M. Shila, Adriaan Larmuseau, Ruoming Jin, Cristian Borcea, NhatHai Phan
This article presents the design, implementation, and evaluation of FLSys, a mobile-cloud federated learning (FL) system, which can be a key component for an open ecosystem of FL models and apps.
1 code implementation • NeurIPS 2021 • Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Stephen Lin, Han Hu, Xiang Bai
We introduce MixTraining, a new training paradigm for object detection that can improve the performance of existing detectors for free.
no code implementations • 23 Oct 2021 • Xuming An, Rongfei Fan, Han Hu, Ning Zhang, Saman Atapattu, Theodoros A. Tsiftsis
To solve this challenging problem, we decompose it as a one-dimensional search of task offloading decision problem and a non-convex optimization problem with task offloading decision given.
1 code implementation • NeurIPS 2021 • Hanzhe Hu, Fangyun Wei, Han Hu, Qiwei Ye, Jinshi Cui, LiWei Wang
The confidence bank is leveraged as an indicator to tilt training towards under-performing categories, instantiated in three strategies: 1) adaptive Copy-Paste and CutMix data augmentation approaches which give more chance for under-performing categories to be copied or cut; 2) an adaptive data sampling approach to encourage pixels from under-performing category to be sampled; 3) a simple yet effective re-weighting method to alleviate the training noise raised by pseudo-labeling.
1 code implementation • 3 Oct 2021 • Li Chen, Yulin Ding, Saeid Pirasteh, Han Hu, Qing Zhu, Haowei Zeng, Haojia Yu, Qisen Shang, Yongfei Song
Then, the critical problem is that in each block with limited samples, conducting training and testing a model is impossible for a satisfactory LSM prediction, especially in dangerous mountainous areas where landslide surveying is expensive.
no code implementations • 19 Sep 2021 • Qun Wang, Fuhui Zhou, Han Hu, Rose Qingyang Hu
Energy-efficient design is of crucial importance in wireless internet of things (IoT) networks.
1 code implementation • 3 Jul 2021 • Zhiwei Hao, Jianyuan Guo, Ding Jia, Kai Han, Yehui Tang, Chao Zhang, Han Hu, Yunhe Wang
Specifically, we train a tiny student model to match a pre-trained teacher model in the patch-level manifold space.
14 code implementations • CVPR 2022 • Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, Han Hu
The vision community is witnessing a modeling shift from CNNs to Transformers, where pure Transformer architectures have attained top accuracy on the major video recognition benchmarks.
Ranked #28 on Action Classification on Kinetics-600 (using extra training data)
8 code implementations • ICCV 2021 • Mengde Xu, Zheng Zhang, Han Hu, JianFeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai, Zicheng Liu
This paper presents an end-to-end semi-supervised object detection approach, in contrast to previous more complex multi-stage methods.
Ranked #6 on Semi-Supervised Object Detection on COCO 100% labeled data (using extra training data)
2 code implementations • NeurIPS 2021 • Fangyun Wei, Yue Gao, Zhirong Wu, Han Hu, Stephen Lin
Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning.
no code implementations • 27 May 2021 • Kaixin Zhang, Hongzhi Wang, Han Hu, Songling Zou, Jiye Qiu, Tongxin Li, Zhishun Wang
In this paper, we demonstrated TENSILE, a method of managing GPU memory in tensor granularity to reduce the GPU memory peak, considering the multiple dynamic workloads.
1 code implementation • 12 May 2021 • Yansong Tang, Zhenyu Jiang, Zhenda Xie, Yue Cao, Zheng Zhang, Philip H. S. Torr, Han Hu
Previous cycle-consistency correspondence learning methods usually leverage image patches for training.
6 code implementations • 10 May 2021 • Zhenda Xie, Yutong Lin, Zhuliang Yao, Zheng Zhang, Qi Dai, Yue Cao, Han Hu
We are witnessing a modeling shift from CNN to Transformers in computer vision.
Ranked #75 on Self-Supervised Image Classification on ImageNet
4 code implementations • ICCV 2021 • Ze Liu, Zheng Zhang, Yue Cao, Han Hu, Xin Tong
Instead of grouping local points to each object candidate, our method computes the feature of an object from all the points in the point cloud with the help of an attention mechanism in the Transformers \cite{vaswani2017attention}, where the contribution of each point is automatically learned in the network training.
Ranked #3 on 3D Object Detection on SUN-RGBD
no code implementations • CVPR 2021 • Jindong Gu, Volker Tresp, Han Hu
The examination reveals five major new/different components in CapsNet: a transformation process, a dynamic routing layer, a squashing function, a marginal loss other than cross-entropy loss, and an additional class-conditional reconstruction loss for regularization.
73 code implementations • ICCV 2021 • Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo
This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision.
Ranked #2 on Image Classification on OmniBenchmark
1 code implementation • 19 Mar 2021 • Xiaosen Wang, Jiadong Lin, Han Hu, Jingdong Wang, Kun He
Various momentum iterative gradient-based methods are shown to be effective to improve the adversarial transferability.
no code implementations • 16 Mar 2021 • Han Hu, Weiwei Song, Qun Wang, Fuhui Zhou, Rose Qingyang Hu
In this paper, the offloading decision and resource allocation problem is studied with mobility consideration.
no code implementations • 9 Feb 2021 • Qun Wang, Han Hu, Haijian Sun, Rose Qingyang Hu
In this paper, we study the task offloading and resource allocation problem in a non-orthogonal multiple access (NOMA) assisted MEC network with security and energy efficiency considerations.
1 code implementation • 12 Jan 2021 • Yujin Huang, Han Hu, Chunyang Chen
Deep learning has shown its power in many applications, including object detection in images, natural-language understanding, and speech recognition.
no code implementations • ICCVW 2021 • Zhuliang Yao, Yue Cao, Yutong Lin, Ze Liu, Zheng Zhang, Han Hu
Transformer-based vision architectures have attracted great attention because of the strong performance over the convolutional neural networks (CNNs).
3 code implementations • 24 Dec 2020 • Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu
The Non-Local Network (NLNet) presents a pioneering approach for capturing long-range dependencies within an image, via aggregating query-specific global context to each query position.
Ranked #40 on Instance Segmentation on COCO test-dev
no code implementations • CUHK Course IERG5350 2020 • Xianbo Wang, Han Hu
Our framework successfully discovered numbers of evasion payloads for each WAF in our experiments and can significantly outperform baseline policy.
no code implementations • 26 Nov 2020 • Qing Zhu, Shengzhi Huang, Han Hu, Haifeng Li, Min Chen, Ruofei Zhong
Finally, multi-view information from both the nadir and oblique images is used in a robust voting procedure to label changes in existing buildings.
no code implementations • 25 Nov 2020 • Xuming An, Rongfei Fan, Han Hu, Ning Zhang, Saman Atapattu, Theodoros A. Tsiftsis
To solve this challenging problem, we decompose it as a one-dimensional search of task offloading decision problem and a non-convex optimization problem with task offloading decision given.
Edge-computing Information Theory Information Theory
1 code implementation • 23 Nov 2020 • Qing Zhu, Qisen Shang, Han Hu, Haojia Yu, Ruofei Zhong
Finally, the completed rendered image is deintegrated to the original texture atlas and the triangles for the vehicles are also flattened for improved meshes.
7 code implementations • CVPR 2021 • Zhenda Xie, Yutong Lin, Zheng Zhang, Yue Cao, Stephen Lin, Han Hu
We argue that the power of contrastive learning has yet to be fully unleashed, as current methods are trained only on instance-level pretext tasks, leading to representations that may be sub-optimal for downstream tasks requiring dense pixel predictions.
4 code implementations • NeurIPS 2020 • Cheng Chi, Fangyun Wei, Han Hu
The proposed module is named \emph{bridging visual representations} (BVR).
Ranked #65 on Object Detection on COCO test-dev
1 code implementation • NeurIPS 2020 • Yihong Chen, Zheng Zhang, Yue Cao, Li-Wei Wang, Stephen Lin, Han Hu
Though RepPoints provides high performance, we find that its heavy reliance on regression for object localization leaves room for improvement.
Ranked #27 on Object Detection on COCO-O
1 code implementation • ECCV 2020 • Ze Liu, Han Hu, Yue Cao, Zheng Zhang, Xin Tong
Our investigation reveals that despite the different designs of these operators, all of these operators make surprisingly similar contributions to the network performance under the same network input and feature numbers and result in the state-of-the-art accuracy on standard benchmarks.
Ranked #4 on 3D Semantic Segmentation on PartNet
1 code implementation • NeurIPS 2020 • Yue Cao, Zhenda Xie, Bin Liu, Yutong Lin, Zheng Zhang, Han Hu
This paper presents parametric instance classification (PIC) for unsupervised visual feature learning.
5 code implementations • ECCV 2020 • Minghao Yin, Zhuliang Yao, Yue Cao, Xiu Li, Zheng Zhang, Stephen Lin, Han Hu
This paper first studies the non-local block in depth, where we find that its attention computation can be split into two terms, a whitened pairwise term accounting for the relationship between two pixels and a unary term representing the saliency of every pixel.
Ranked #20 on Semantic Segmentation on Cityscapes test (using extra training data)
2 code implementations • 1 Apr 2020 • Phung Lai, NhatHai Phan, Han Hu, Anuja Badeti, David Newman, Dejing Dou
In this paper, we introduce a novel interpreting framework that learns an interpretable model based on an ontology-based sampling technique to explain agnostic prediction models.
1 code implementation • ECCV 2020 • Bin Liu, Yue Cao, Yutong Lin, Qi Li, Zheng Zhang, Mingsheng Long, Han Hu
This paper introduces a negative margin loss to metric learning based few-shot learning methods.
2 code implementations • CVPR 2020 • Yihong Chen, Yue Cao, Han Hu, Li-Wei Wang
We argue that there are two important cues for humans to recognize objects in videos: the global semantic information and the local localization information.
Ranked #14 on Video Object Detection on ImageNet VID
1 code implementation • 21 Feb 2020 • Qing Zhu, Zhendong Wang, Han Hu, Linfu Xie, Xuming Ge, Yeting Zhang
Second, aerial models are rendered to the initial ground views, in which the color, depth and normal images are obtained.
1 code implementation • 20 Feb 2020 • Qing Zhu, Lin Chen, Han Hu, Binzhi Xu, Yeting Zhang, Haifeng Li
The second uses a scale attention mechanism to guide the up-sampling of features from the coarse level by a learned weight map.
1 code implementation • 20 Feb 2020 • Han Hu, Libin Wang, Mier Zhang, Yulin Ding, Qing Zhu
Regularized arrangement of primitives on building fa\c{c}ades to aligned locations and consistent sizes is important towards structured reconstruction of urban environment.
2 code implementations • ECCV 2020 • Ze Yang, Yinghao Xu, Han Xue, Zheng Zhang, Raquel Urtasun, Li-Wei Wang, Stephen Lin, Han Hu
We present a new object representation, called Dense RepPoints, that utilizes a large set of points to describe an object at multiple levels, including both box level and pixel level.