Search Results for author: Han Hu

Found 161 papers, 103 papers with code

Scalable Differential Privacy with Certified Robustness in Adversarial Learning

1 code implementation ICML 2020 Hai Phan, My T. Thai, Han Hu, Ruoming Jin, Tong Sun, Dejing Dou

In this paper, we aim to develop a scalable algorithm to preserve differential privacy (DP) in adversarial learning for deep neural networks (DNNs), with certified robustness to adversarial examples.

小样本关系分类研究综述(Few-Shot Relation Classification: A Survey)

no code implementations CCL 2020 Han Hu, Pengyuan Liu

关系分类作为构建结构化知识的重要一环, 在自然语言处理领域备受关注。但在很多应用领域中(医疗、金融领域), 收集充足的用于训练关系分类模型的数据是十分困难的。近年来, 仅需要少量训练样本的小样本学习研究逐渐新兴于各大领域。本文对近期小样本关系分类模型与方法进行了系统的综述。根据度量方法的不同, 将现有方法分为原型式和分布式两大类。根据是否利用额外信息, 将模型分为预训练和非预训练两大类。此外, 除了常规设定下的小样本学习, 本文还梳理了跨领域和稀缺资源场景下的小样本学习, 并探讨了目前小样本关系分类方法的局限性, 分析了跨领域小样本 学习面临的技术挑战。最后, 展望了小样本关系分类未来的发展方向。

Few-Shot Relation Classification

Joint Input and Output Coordination for Class-Incremental Learning

no code implementations9 Sep 2024 Shuai Wang, Yibing Zhan, Yong Luo, Han Hu, Wei Yu, Yonggang Wen, DaCheng Tao

This mechanism assigns different weights to different categories of data according to the gradient of the output score, and uses knowledge distillation (KD) to reduce the mutual interference between the outputs of old and new tasks.

Class Incremental Learning Incremental Learning +1

Integrated Sensing, Communication, and Powering over Multi-antenna OFDM Systems

no code implementations26 Aug 2024 Yilong Chen, Chao Hu, Zixiang Ren, Han Hu, Jie Xu, Lexi Xu, Lei Liu, Shuguang Cui

Furthermore, we consider the beam scanning for sensing, in which the joint beams scan in different directions over time to sense potential targets.

Scheduling

SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models

1 code implementation19 Aug 2024 Anke Tang, Li Shen, Yong Luo, Shuai Xie, Han Hu, Lefei Zhang, Bo Du, DaCheng Tao

Deep model training on extensive datasets is increasingly becoming cost-prohibitive, prompting the widespread adoption of deep model fusion techniques to leverage knowledge from pre-existing models.

Image Classification Text Generation

Sequential Federated Learning in Hierarchical Architecture on Non-IID Datasets

no code implementations19 Aug 2024 Xingrun Yan, Shiyuan Zuo, Rongfei Fan, Han Hu, Li Shen, Puning Zhao, Yong Luo

In a real federated learning (FL) system, communication overhead for passing model parameters between the clients and the parameter server (PS) is often a bottleneck.

Federated Learning

Byzantine-resilient Federated Learning Employing Normalized Gradients on Non-IID Datasets

no code implementations18 Aug 2024 Shiyuan Zuo, Xingrun Yan, Rongfei Fan, Li Shen, Puning Zhao, Jie Xu, Han Hu

In cases where the loss function is strongly convex, the zero optimality gap achieving rate can be improved to be linear.

Federated Learning

ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws

no code implementations15 Aug 2024 Ruihang Li, Yixuan Wei, Miaosen Zhang, Nenghai Yu, Han Hu, Houwen Peng

Extensive experiments reveal that semantic diversity is a reliable indicator of dataset diversity, and ScalingFilter achieves an optimal balance between downstream performance and semantic diversity.

Diversity

Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion

1 code implementation14 Jun 2024 Anke Tang, Li Shen, Yong Luo, Shiwei Liu, Han Hu, Bo Du

Once the routers are learned and a preference vector is set, the MoE module can be unloaded, thus no additional computational cost is introduced during inference.

Multi-Task Learning

FusionBench: A Comprehensive Benchmark of Deep Model Fusion

1 code implementation5 Jun 2024 Anke Tang, Li Shen, Yong Luo, Han Hu, Bo Du, DaCheng Tao

These techniques range from model ensemble methods, which combine the predictions to improve the overall performance, to model merging, which integrates different models into a single one, and model mixing methods, which upscale or recombine the components of the original models.

Image Classification text-classification +2

Xwin-LM: Strong and Scalable Alignment Practice for LLMs

1 code implementation30 May 2024 Bolin Ni, Jingcheng Hu, Yixuan Wei, Houwen Peng, Zheng Zhang, Gaofeng Meng, Han Hu

In this work, we present Xwin-LM, a comprehensive suite of alignment methodologies for large language models (LLMs).

Federated Learning with Only Positive Labels by Exploring Label Correlations

no code implementations24 Apr 2024 Xuming An, Dui Wang, Li Shen, Yong Luo, Han Hu, Bo Du, Yonggang Wen, DaCheng Tao

Specifically, FedALC estimates the label correlations in the class embedding learning for different label pairs and utilizes it to improve the model training.

Federated Learning Multi-Label Classification

Byzantine-resilient Federated Learning With Adaptivity to Data Heterogeneity

no code implementations20 Mar 2024 Shiyuan Zuo, Xingrun Yan, Rongfei Fan, Han Hu, Hangguan Shan, Tony Q. S. Quek

This paper deals with federated learning (FL) in the presence of malicious Byzantine attacks and data heterogeneity.

Federated Learning

Common 7B Language Models Already Possess Strong Math Capabilities

1 code implementation7 Mar 2024 Chen Li, Weiqi Wang, Jingcheng Hu, Yixuan Wei, Nanning Zheng, Han Hu, Zheng Zhang, Houwen Peng

This paper shows that the LLaMA-2 7B model with common pre-training already exhibits strong mathematical abilities, as evidenced by its impressive accuracy of 97. 7% and 72. 0% on the GSM8K and MATH benchmarks, respectively, when selecting the best response from 256 random generations.

GSM8K Math

Data-efficient Large Vision Models through Sequential Autoregression

1 code implementation7 Feb 2024 Jianyuan Guo, Zhiwei Hao, Chengcheng Wang, Yehui Tang, Han Wu, Han Hu, Kai Han, Chang Xu

Training general-purpose vision models on purely sequential visual data, eschewing linguistic inputs, has heralded a new frontier in visual understanding.

Dynamic Routing for Integrated Satellite-Terrestrial Networks: A Constrained Multi-Agent Reinforcement Learning Approach

no code implementations23 Dec 2023 Yifeng Lyu, Han Hu, Rongfei Fan, Zhi Liu, Jianping An, Shiwen Mao

To address these challenges, we study packet routing with ground stations and satellites working jointly to transmit packets, while prioritizing fast communication and meeting energy efficiency and packet loss requirements.

Multi-agent Reinforcement Learning

AI Generated Signal for Wireless Sensing

no code implementations22 Dec 2023 Hanxiang He, Han Hu, Xintao Huan, Heng Liu, Jianping An, Shiwen Mao

Deep learning has significantly advanced wireless sensing technology by leveraging substantial amounts of high-quality training data.

Attribute Denoising +1

Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

1 code implementation11 Dec 2023 Anke Tang, Li Shen, Yong Luo, Liang Ding, Han Hu, Bo Du, DaCheng Tao

At the upper level, we focus on learning a shared Concrete mask to identify the subspace, while at the inner level, model merging is performed to maximize the performance of the merged model.

Meta-Learning Task Arithmetic

Segment and Caption Anything

1 code implementation CVPR 2024 Xiaoke Huang, JianFeng Wang, Yansong Tang, Zheng Zhang, Han Hu, Jiwen Lu, Lijuan Wang, Zicheng Liu

We propose a method to efficiently equip the Segment Anything Model (SAM) with the ability to generate regional captions.

Caption Generation object-detection +2

MotionEditor: Editing Video Motion via Content-Aware Diffusion

1 code implementation CVPR 2024 Shuyuan Tu, Qi Dai, Zhi-Qi Cheng, Han Hu, Xintong Han, Zuxuan Wu, Yu-Gang Jiang

This mechanism enables the editing branch to query the key and value from the reconstruction branch in a decoupled manner, making the editing branch retain the original background and protagonist appearance.

Video Editing

GAIA: Zero-shot Talking Avatar Generation

no code implementations26 Nov 2023 Tianyu He, Junliang Guo, Runyi Yu, Yuchi Wang, Jialiang Zhu, Kaikai An, Leyi Li, Xu Tan, Chunyu Wang, Han Hu, HsiangTao Wu, Sheng Zhao, Jiang Bian

Zero-shot talking avatar generation aims at synthesizing natural talking videos from speech and a single portrait image.

Diversity

One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation

1 code implementation NeurIPS 2023 Zhiwei Hao, Jianyuan Guo, Kai Han, Yehui Tang, Han Hu, Yunhe Wang, Chang Xu

To tackle the challenge in distilling heterogeneous models, we propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures.

Knowledge Distillation

A Survey on Video Diffusion Models

1 code implementation16 Oct 2023 Zhen Xing, Qijun Feng, Haoran Chen, Qi Dai, Han Hu, Hang Xu, Zuxuan Wu, Yu-Gang Jiang

However, existing surveys mainly focus on diffusion models in the context of image generation, with few up-to-date reviews on their application in the video domain.

Image Generation Video Editing +2

Learn From Model Beyond Fine-Tuning: A Survey

1 code implementation12 Oct 2023 Hongling Zheng, Li Shen, Anke Tang, Yong Luo, Han Hu, Bo Du, DaCheng Tao

LFM focuses on the research, modification, and design of FM based on the model interface, so as to better understand the model structure and weights (in a black box environment), and to generalize the model to downstream tasks.

Meta-Learning Model Editing

Pairwise GUI Dataset Construction Between Android Phones and Tablets

2 code implementations NeurIPS 2023 Han Hu, Haolan Zhan, Yujin Huang, Di Liu

There are currently several publicly accessible GUI page datasets for phones, but none for pairwise GUIs between phones and tablets.

Parameter Efficient Multi-task Model Fusion with Partial Linearization

1 code implementation7 Oct 2023 Anke Tang, Li Shen, Yong Luo, Yibing Zhan, Han Hu, Bo Du, Yixin Chen, DaCheng Tao

We demonstrate that our partial linearization technique enables a more effective fusion of multiple tasks into a single model, outperforming standard adapter tuning and task arithmetic alone.

parameter-efficient fine-tuning Task Arithmetic

Deep Model Fusion: A Survey

1 code implementation27 Sep 2023 Weishi Li, Yong Peng, Miao Zhang, Liang Ding, Han Hu, Li Shen

Specifically, we categorize existing deep model fusion methods as four-fold: (1) "Mode connectivity", which connects the solutions in weight space via a path of non-increasing loss, in order to obtain better initialization for model fusion; (2) "Alignment" matches units between neural networks to create better conditions for fusion; (3) "Weight average", a classical model fusion method, averages the weights of multiple models to obtain more accurate results closer to the optimal solution; (4) "Ensemble learning" combines the outputs of diverse models, which is a foundational technique for improving the accuracy and robustness of the final model.

Ensemble Learning

TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance

1 code implementation ICCV 2023 Kan Wu, Houwen Peng, Zhenghong Zhou, Bin Xiao, Mengchen Liu, Lu Yuan, Hong Xuan, Michael Valenzuela, Xi, Chen, Xinggang Wang, Hongyang Chao, Han Hu

In this paper, we propose a novel cross-modal distillation method, called TinyCLIP, for large-scale language-image pre-trained models.

Exploring Non-additive Randomness on ViT against Query-Based Black-Box Attacks

no code implementations12 Sep 2023 Jindong Gu, Fangyun Wei, Philip Torr, Han Hu

In this work, we first taxonomize the stochastic defense strategies against QBBA.

DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices

no code implementations10 Sep 2023 Guanyu Xu, Zhiwei Hao, Yong Luo, Han Hu, Jianping An, Shiwen Mao

Our objective is to achieve fast and energy-efficient collaborative inference while maintaining comparable accuracy compared with large ViTs.

Collaborative Inference Knowledge Distillation

PartSeg: Few-shot Part Segmentation via Part-aware Prompt Learning

no code implementations24 Aug 2023 Mengya Han, Heliang Zheng, Chaoyue Wang, Yong Luo, Han Hu, Jing Zhang, Yonggang Wen

In this work, we address the task of few-shot part segmentation, which aims to segment the different parts of an unseen object using very few labeled examples.

Language Modelling Segmentation

Federated Learning Robust to Byzantine Attacks: Achieving Zero Optimality Gap

no code implementations21 Aug 2023 Shiyuan Zuo, Rongfei Fan, Han Hu, Ning Zhang, Shimin Gong

In this paper, we propose a robust aggregation method for federated learning (FL) that can effectively tackle malicious Byzantine attacks.

Federated Learning

SimDA: Simple Diffusion Adapter for Efficient Video Generation

no code implementations CVPR 2024 Zhen Xing, Qi Dai, Han Hu, Zuxuan Wu, Yu-Gang Jiang

In this work, we propose a Simple Diffusion Adapter (SimDA) that fine-tunes only 24M out of 1. 1B parameters of a strong T2I model, adapting it to video generation in a parameter-efficient way.

Transfer Learning Video Editing +2

Over-the-Air Computation Aided Federated Learning with the Aggregation of Normalized Gradient

no code implementations17 Aug 2023 Rongfei Fan, Xuming An, Shiyuan Zuo, Han Hu

In case of smooth and strongly convex loss function, we prove our proposed method can achieve minimal training loss at linear rate with any small positive tolerance.

Federated Learning

Joint Power Control and Data Size Selection for Over-the-Air Computation Aided Federated Learning

1 code implementation17 Aug 2023 Xuming An, Rongfei Fan, Shiyuan Zuo, Han Hu, Hai Jiang, Ning Zhang

For parameter aggregating in FL, over-the-air computation is a spectrum-efficient solution, which allows all mobile devices to transmit their parameter-mapped signals concurrently to a BS.

Federated Learning

Rethinking the Localization in Weakly Supervised Object Localization

no code implementations11 Aug 2023 Rui Xu, Yong Luo, Han Hu, Bo Du, Jialie Shen, Yonggang Wen

Weakly supervised object localization (WSOL) is one of the most popular and challenging tasks in computer vision.

Object Weakly-Supervised Object Localization

Cross-Silo Prototypical Calibration for Federated Learning with Non-IID Data

1 code implementation7 Aug 2023 Zhuang Qi, Lei Meng, Zitan Chen, Han Hu, Hui Lin, Xiangxu Meng

To address this issue, this paper presents a cross-silo prototypical calibration method (FedCSPC), which takes additional prototype information from the clients to learn a unified feature space on the server side.

Contrastive Learning Federated Learning +1

DETR Doesn't Need Multi-Scale or Locality Design

1 code implementation3 Aug 2023 Yutong Lin, Yuhui Yuan, Zheng Zhang, Chen Li, Nanning Zheng, Han Hu

This paper presents an improved DETR detector that maintains a "plain" nature: using a single-scale feature map and global cross-attention calculations without specific locality constraints, in contrast to previous leading DETR-based detectors that reintroduce architectural inductive biases of multi-scale and locality into the decoder.

Decoder

LGViT: Dynamic Early Exiting for Accelerating Vision Transformer

1 code implementation1 Aug 2023 Guanyu Xu, Jiawei Hao, Li Shen, Han Hu, Yong Luo, Hui Lin, Jialie Shen

Recently, the efficient deployment and acceleration of powerful vision transformers (ViTs) on resource-limited edge devices for providing multimedia services have become attractive tasks.

Multi-Granularity Hand Action Detection

2 code implementations19 Jun 2023 Ting Zhe, Jing Zhang, YongQian Li, Yong Luo, Han Hu, DaCheng Tao

To fill this gap, we introduce the FHA-Kitchens (Fine-Grained Hand Actions in Kitchen Scenes) dataset, providing both coarse- and fine-grained hand action categories along with localization annotations.

Action Detection Action Localization +6

StructuredMesh: 3D Structured Optimization of Façade Components on Photogrammetric Mesh Models using Binary Integer Programming

no code implementations7 Jun 2023 Libin Wang, Han Hu, Qisen Shang, Bo Xu, Qing Zhu

The lack of fa\c{c}ade structures in photogrammetric mesh models renders them inadequate for meeting the demands of intricate applications.

object-detection Object Detection

VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large Scale

1 code implementation25 May 2023 Zhiwei Hao, Jianyuan Guo, Kai Han, Han Hu, Chang Xu, Yunhe Wang

The tremendous success of large models trained on extensive datasets demonstrates that scale is a key ingredient in achieving superior results.

Data Augmentation Knowledge Distillation

Improving Heterogeneous Model Reuse by Density Estimation

1 code implementation23 May 2023 Anke Tang, Yong Luo, Han Hu, Fengxiang He, Kehua Su, Bo Du, Yixin Chen, DaCheng Tao

This paper studies multiparty learning, aiming to learn a model using the private data of different participants.

Density Estimation Selection bias

EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

3 code implementations CVPR 2023 Xinyu Liu, Houwen Peng, Ningxin Zheng, Yuqing Yang, Han Hu, Yixuan Yuan

Comprehensive experiments demonstrate EfficientViT outperforms existing efficient models, striking a good trade-off between speed and accuracy.

Implicit Temporal Modeling with Learnable Alignment for Video Recognition

1 code implementation ICCV 2023 Shuyuan Tu, Qi Dai, Zuxuan Wu, Zhi-Qi Cheng, Han Hu, Yu-Gang Jiang

While modeling temporal information within straight through tube is widely adopted in literature, we find that simple frame alignment already provides enough essence without temporal attention.

Action Classification Action Recognition +1

BCE-Net: Reliable Building Footprints Change Extraction based on Historical Map and Up-to-Date Images using Contrastive Learning

1 code implementation14 Apr 2023 Cheng Liao, Han Hu, Xuekun Yuan, Haifeng Li, Chao Liu, Chunyang Liu, Gui Fu, Yulin Ding, Qing Zhu

This contrastive learning strategy allowed us to inject the semantics of buildings into a pipeline for the detection of changes, which is achieved by increasing the distinguishability of features of buildings from those of non-buildings.

Change Detection Contrastive Learning

SuperpixelGraph: Semi-automatic generation of building footprint through semantic-sensitive superpixel and neural graph networks

no code implementations12 Apr 2023 Haojia Yu, Han Hu, Bo Xu, Qisen Shang, Zhendong Wang, Qing Zhu

Most urban applications necessitate building footprints in the form of concise vector graphics with sharp boundaries rather than pixel-wise raster images.

Segmentation Semantic Segmentation +2

Semantic Image Translation for Repairing the Texture Defects of Building Models

no code implementations30 Mar 2023 Qisen Shang, Han Hu, Haojia Yu, Bo Xu, Libin Wang, Qing Zhu

Experimental results on publicly available fa\c{c}ade image and 3D model datasets demonstrate that our method yields superior results and effectively addresses issues associated with flawed textures.

Image Generation Style Transfer +2

Human Pose as Compositional Tokens

1 code implementation CVPR 2023 Zigang Geng, Chunyu Wang, Yixuan Wei, Ze Liu, Houqiang Li, Han Hu

Human pose is typically represented by a coordinate vector of body joints or their heatmap embeddings.

Decoder Pose Estimation

Domain-knowledge Inspired Pseudo Supervision (DIPS) for Unsupervised Image-to-Image Translation Models to Support Cross-Domain Classification

2 code implementations18 Mar 2023 Firas Al-Hindawi, Md Mahfuzur Rahman Siddiquee, Teresa Wu, Han Hu, Ying Sun

Cross-domain classification frameworks were developed to handle this data domain shift problem by utilizing unsupervised image-to-image translation models to translate an input image from the unlabeled domain to the labeled domain.

domain classification Translation +1

Efficient Diffusion Training via Min-SNR Weighting Strategy

2 code implementations ICCV 2023 Tiankai Hang, Shuyang Gu, Chen Li, Jianmin Bao, Dong Chen, Han Hu, Xin Geng, Baining Guo

Denoising diffusion models have been a mainstream approach for image generation, however, training these models often suffers from slow convergence.

Denoising Image Generation +2

DeepMIM: Deep Supervision for Masked Image Modeling

1 code implementation15 Mar 2023 Sucheng Ren, Fangyun Wei, Samuel Albanie, Zheng Zhang, Han Hu

Deep supervision, which involves extra supervisions to the intermediate features of a neural network, was widely used in image classification in the early deep learning era since it significantly reduces the training difficulty and eases the optimization like avoiding gradient vanish over the vanilla training.

Image Classification object-detection +2

SGDA: Towards 3D Universal Pulmonary Nodule Detection via Slice Grouped Domain Attention

1 code implementation7 Mar 2023 Rui Xu, Zhi Liu, Yong Luo, Han Hu, Li Shen, Bo Du, Kaiming Kuang, Jiancheng Yang

To address this issue, we propose a slice grouped domain attention (SGDA) module to enhance the generalization capability of the pulmonary nodule detection networks.

Computed Tomography (CT)

Subspace based Federated Unlearning

no code implementations24 Feb 2023 Guanghao Li, Li Shen, Yan Sun, Yue Hu, Han Hu, DaCheng Tao

Federated learning (FL) enables multiple clients to train a machine learning model collaboratively without exchanging their local data.

Federated Learning

Side Adapter Network for Open-Vocabulary Semantic Segmentation

3 code implementations CVPR 2023 Mengde Xu, Zheng Zhang, Fangyun Wei, Han Hu, Xiang Bai

A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias which is applied in the CLIP model to recognize the class of masks.

Language Modelling Open Vocabulary Semantic Segmentation +3

FedABC: Targeting Fair Competition in Personalized Federated Learning

no code implementations15 Feb 2023 Dui Wang, Li Shen, Yong Luo, Han Hu, Kehua Su, Yonggang Wen, DaCheng Tao

In particular, we adopt the ``one-vs-all'' training strategy in each client to alleviate the unfair competition between classes by constructing a personalized binary classification problem for each class.

Binary Classification Personalized Federated Learning

Training-free Lexical Backdoor Attacks on Language Models

1 code implementation8 Feb 2023 Yujin Huang, Terry Yue Zhuo, Qiongkai Xu, Han Hu, Xingliang Yuan, Chunyang Chen

In this work, we propose Training-Free Lexical Backdoor Attack (TFLexAttack) as the first training-free backdoor attack on language models.

Backdoor Attack Data Poisoning +1

All in Tokens: Unifying Output Space of Visual Tasks via Soft Token

1 code implementation ICCV 2023 Jia Ning, Chen Li, Zheng Zhang, Zigang Geng, Qi Dai, Kun He, Han Hu

With these new techniques and other designs, we show that the proposed general-purpose task-solver can perform both instance segmentation and depth estimation well.

Instance Segmentation Monocular Depth Estimation +1

TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models

2 code implementations CVPR 2023 Sucheng Ren, Fangyun Wei, Zheng Zhang, Han Hu

Our TinyMIM model of tiny size achieves 79. 6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget.

Image Classification Semantic Segmentation

Improving CLIP Fine-tuning Performance

1 code implementation ICCV 2023 Yixuan Wei, Han Hu, Zhenda Xie, Ze Liu, Zheng Zhang, Yue Cao, Jianmin Bao, Dong Chen, Baining Guo

Experiments suggest that the feature map distillation approach significantly boosts the fine-tuning performance of CLIP models on several typical downstream vision tasks.

object-detection Object Detection +1

iCLIP: Bridging Image Classification and Contrastive Language-Image Pre-Training for Visual Recognition

no code implementations CVPR 2023 Yixuan Wei, Yue Cao, Zheng Zhang, Houwen Peng, Zhuliang Yao, Zhenda Xie, Han Hu, Baining Guo

This paper presents a method that effectively combines two prevalent visual recognition methods, i. e., image classification and contrastive language-image pre-training, dubbed iCLIP.

Classification Image Classification +2

DETR Does Not Need Multi-Scale or Locality Design

1 code implementation ICCV 2023 Yutong Lin, Yuhui Yuan, Zheng Zhang, Chen Li, Nanning Zheng, Han Hu

This paper presents an improved DETR detector that maintains a "plain" nature: using a single-scale feature map and global cross-attention calculations without specific locality constraints, in contrast to previous leading DETR-based detectors that reintroduce architectural inductive biases of multi-scale and locality into the decoder.

Decoder

Attentive Mask CLIP

1 code implementation ICCV 2023 Yifan Yang, Weiquan Huang, Yixuan Wei, Houwen Peng, Xinyang Jiang, Huiqiang Jiang, Fangyun Wei, Yin Wang, Han Hu, Lili Qiu, Yuqing Yang

To address this issue, we propose an attentive token removal approach for CLIP training, which retains tokens with a high semantic correlation to the text description.

Contrastive Learning Retrieval +1

ResFormer: Scaling ViTs with Multi-Resolution Training

1 code implementation CVPR 2023 Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yu Qiao, Yu-Gang Jiang

We introduce, ResFormer, a framework that is built upon the seminal idea of multi-resolution training for improved performance on a wide spectrum of, mostly unseen, testing resolutions.

Action Recognition Image Classification +4

Exploring Discrete Diffusion Models for Image Captioning

1 code implementation21 Nov 2022 Zixin Zhu, Yixuan Wei, JianFeng Wang, Zhe Gan, Zheng Zhang, Le Wang, Gang Hua, Lijuan Wang, Zicheng Liu, Han Hu

The image captioning task is typically realized by an auto-regressive method that decodes the text tokens one by one.

Image Captioning Image Generation

Could Giant Pretrained Image Models Extract Universal Representations?

no code implementations3 Nov 2022 Yutong Lin, Ze Liu, Zheng Zhang, Han Hu, Nanning Zheng, Stephen Lin, Yue Cao

In this paper, we present a study of frozen pretrained models when applied to diverse and representative computer vision tasks, including object detection, semantic segmentation and video action recognition.

Action Recognition In Videos Instance Segmentation +5

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

4 code implementations3 Oct 2022 Weicong Liang, Yuhui Yuan, Henghui Ding, Xiao Luo, WeiHong Lin, Ding Jia, Zheng Zhang, Chao Zhang, Han Hu

Vision transformers have recently achieved competitive results across various vision tasks but still suffer from heavy computation costs when processing a large number of tokens.

Clustering Depth Estimation +6

One-to-Many Semantic Communication Systems: Design, Implementation, Performance Evaluation

no code implementations20 Sep 2022 Han Hu, Xingwu Zhu, Fuhui Zhou, Wei Wu, Rose Qingyang Hu, Hongbo Zhu

To effectively exploit the benefits enabled by semantic communication, in this paper, we propose a one-to-many semantic communication system.

Semantic Communication Transfer Learning

Not All Instances Contribute Equally: Instance-adaptive Class Representation Learning for Few-Shot Visual Recognition

no code implementations7 Sep 2022 Mengya Han, Yibing Zhan, Yong Luo, Bo Du, Han Hu, Yonggang Wen, DaCheng Tao

To address the above issues, we propose a novel metric-based meta-learning framework termed instance-adaptive class representation learning network (ICRL-Net) for few-shot visual recognition.

Meta-Learning Representation Learning

Leveraging GAN Priors for Few-Shot Part Segmentation

1 code implementation27 Jul 2022 Mengya Han, Heliang Zheng, Chaoyue Wang, Yong Luo, Han Hu, Bo Du

Overall, this work is an attempt to explore the internal relevance between generation tasks and perception tasks by prompt designing.

Image Generation Segmentation

DETRs with Hybrid Matching

8 code implementations CVPR 2023 Ding Jia, Yuhui Yuan, Haodi He, Xiaopei Wu, Haojun Yu, WeiHong Lin, Lei Sun, Chao Zhang, Han Hu

One-to-one set matching is a key design for DETR to establish its end-to-end capability, so that object detection does not require a hand-crafted NMS (non-maximum suppression) to remove duplicate detections.

Object Detection Pose Estimation +2

Lifelong DP: Consistently Bounded Differential Privacy in Lifelong Machine Learning

1 code implementation26 Jul 2022 Phung Lai, Han Hu, NhatHai Phan, Ruoming Jin, My T. Thai, An M. Chen

In this paper, we show that the process of continually learning new tasks and memorizing previous tasks introduces unknown privacy risks and challenges to bound the privacy loss.

BIG-bench Machine Learning

Tutel: Adaptive Mixture-of-Experts at Scale

2 code implementations7 Jun 2022 Changho Hwang, Wei Cui, Yifan Xiong, Ziyue Yang, Ze Liu, Han Hu, Zilong Wang, Rafael Salas, Jithin Jose, Prabhat Ram, Joe Chau, Peng Cheng, Fan Yang, Mao Yang, Yongqiang Xiong

On efficiency, Flex accelerates SwinV2-MoE, achieving up to 1. 55x and 2. 11x speedup in training and inference over Fairseq, respectively.

Object Detection

Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation

1 code implementation27 May 2022 Yixuan Wei, Han Hu, Zhenda Xie, Zheng Zhang, Yue Cao, Jianmin Bao, Dong Chen, Baining Guo

These properties, which we aggregately refer to as optimization friendliness, are identified and analyzed by a set of attention- and optimization-related diagnosis tools.

Ranked #2 on Instance Segmentation on COCO test-dev (using extra training data)

Contrastive Learning Image Classification +5

Revealing the Dark Secrets of Masked Image Modeling

1 code implementation CVPR 2023 Zhenda Xie, Zigang Geng, Jingcheng Hu, Zheng Zhang, Han Hu, Yue Cao

In this paper, we compare MIM with the long-dominant supervised pre-trained models from two perspectives, the visualizations and the experiments, to uncover their key representational differences.

Diversity Inductive Bias +4

CDFKD-MFS: Collaborative Data-free Knowledge Distillation via Multi-level Feature Sharing

1 code implementation24 May 2022 Zhiwei Hao, Yong Luo, Zhi Wang, Han Hu, Jianping An

To tackle this challenge, we propose a framework termed collaborative data-free knowledge distillation via multi-level feature sharing (CDFKD-MFS), which consists of a multi-header student module, an asymmetric adversarial data-free KD module, and an attention-based aggregation module.

Data-free Knowledge Distillation

Multi-Agent Collaborative Inference via DNN Decoupling: Intermediate Feature Compression and Edge Learning

1 code implementation24 May 2022 Zhiwei Hao, Guanyu Xu, Yong Luo, Han Hu, Jianping An, Shiwen Mao

In this paper, we study the multi-agent collaborative inference scenario, where a single edge server coordinates the inference of multiple UEs.

Collaborative Inference Feature Compression

Deeper Insights into the Robustness of ViTs towards Common Corruptions

no code implementations26 Apr 2022 Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yu-Gang Jiang

With Vision Transformers (ViTs) making great advances in a variety of computer vision tasks, recent literature have proposed various variants of vanilla ViTs to achieve better efficiency and efficacy.

Benchmarking Data Augmentation

iCAR: Bridging Image Classification and Image-text Alignment for Visual Recognition

no code implementations22 Apr 2022 Yixuan Wei, Yue Cao, Zheng Zhang, Zhuliang Yao, Zhenda Xie, Han Hu, Baining Guo

Second, we convert the image classification problem from learning parametric category classifier weights to learning a text encoder as a meta network to generate category classifier weights.

Action Recognition Classification +7

Enhancing the Robustness, Efficiency, and Diversity of Differentiable Architecture Search

no code implementations10 Apr 2022 Chao Li, Jia Ning, Han Hu, Kun He

Differentiable architecture search (DARTS) has attracted much attention due to its simplicity and significant improvement in efficiency.

Diversity

RankSeg: Adaptive Pixel Classification with Image Category Ranking for Segmentation

2 code implementations8 Mar 2022 Haodi He, Yuhui Yuan, Xiangyu Yue, Han Hu

Given an input image or video, our framework first conducts multi-label classification over the complete label, then sorts the complete label and selects a small subset according to their class confidence scores.

Classification Instance Segmentation +6

Energy Efficiency and Delay Tradeoff in an MEC-Enabled Mobile IoT Network

no code implementations8 Feb 2022 Han Hu, Weiwei Song, Qun Wang, Rose Qingyang Hu, Hongbo Zhu

Theoretical analysis proves that the proposed algorithm can achieve a $[O(1/V), O(V)]$ tradeoff between EE and service delay.

Edge-computing Stochastic Optimization

Semi-Supervised Adversarial Recognition of Refined Window Structures for Inverse Procedural Façade Modeling

no code implementations22 Jan 2022 Han Hu, Xinrong Liang, Yulin Ding, Qisen Shang, Bo Xu, Xuming Ge, Min Chen, Ruofei Zhong, Qing Zhu

Unfortunately, the large amount of interactive sample labeling efforts has dramatically hindered the application of deep learning methods, especially for 3D modeling tasks, which require heterogeneous samples.

Generative Adversarial Network

A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model

2 code implementations29 Dec 2021 Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Han Hu, Xiang Bai

However, semantic segmentation and the CLIP model perform on different visual granularity, that semantic segmentation processes on pixels while CLIP performs on images.

Image Classification Language Modelling +8

Swin Transformer V2: Scaling Up Capacity and Resolution

20 code implementations CVPR 2022 Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo

Three main techniques are proposed: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) A log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.

Ranked #4 on Image Classification on ImageNet V2 (using extra training data)

Action Classification Image Classification +3

SimMIM: A Simple Framework for Masked Image Modeling

5 code implementations CVPR 2022 Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, Han Hu

We also leverage this approach to facilitate the training of a 3B model (SwinV2-G), that by $40\times$ less data than that in previous practice, we achieve the state-of-the-art on four representative vision benchmarks.

Representation Learning Self-Supervised Image Classification +1

FLSys: Toward an Open Ecosystem for Federated Learning Mobile Apps

no code implementations17 Nov 2021 Xiaopeng Jiang, Han Hu, Vijaya Datta Mayyuri, An Chen, Devu M. Shila, Adriaan Larmuseau, Ruoming Jin, Cristian Borcea, NhatHai Phan

This article presents the design, implementation, and evaluation of FLSys, a mobile-cloud federated learning (FL) system, which can be a key component for an open ecosystem of FL models and apps.

Data Augmentation Federated Learning +3

Bootstrap Your Object Detector via Mixed Training

1 code implementation NeurIPS 2021 Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Stephen Lin, Han Hu, Xiang Bai

We introduce MixTraining, a new training paradigm for object detection that can improve the performance of existing detectors for free.

Data Augmentation Missing Labels +3

Joint Task Offloading and Resource Allocation for IoT Edge Computing with Sequential Task Dependency

no code implementations23 Oct 2021 Xuming An, Rongfei Fan, Han Hu, Ning Zhang, Saman Atapattu, Theodoros A. Tsiftsis

To solve this challenging problem, we decompose it as a one-dimensional search of task offloading decision problem and a non-convex optimization problem with task offloading decision given.

Edge-computing

Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning

1 code implementation NeurIPS 2021 Hanzhe Hu, Fangyun Wei, Han Hu, Qiwei Ye, Jinshi Cui, LiWei Wang

The confidence bank is leveraged as an indicator to tilt training towards under-performing categories, instantiated in three strategies: 1) adaptive Copy-Paste and CutMix data augmentation approaches which give more chance for under-performing categories to be copied or cut; 2) an adaptive data sampling approach to encourage pixels from under-performing category to be sampled; 3) a simple yet effective re-weighting method to alleviate the training noise raised by pseudo-labeling.

Data Augmentation Semi-Supervised Semantic Segmentation

Meta-learning an Intermediate Representation for Few-shot Block-wise Prediction of Landslide Susceptibility

1 code implementation3 Oct 2021 Li Chen, Yulin Ding, Saeid Pirasteh, Han Hu, Qing Zhu, Haowei Zeng, Haojia Yu, Qisen Shang, Yongfei Song

Then, the critical problem is that in each block with limited samples, conducting training and testing a model is impossible for a satisfactory LSM prediction, especially in dangerous mountainous areas where landslide surveying is expensive.

Meta-Learning

Energy-Efficient Design for IRS-Assisted MEC Networks with NOMA

no code implementations19 Sep 2021 Qun Wang, Fuhui Zhou, Han Hu, Rose Qingyang Hu

Energy-efficient design is of crucial importance in wireless internet of things (IoT) networks.

Edge-computing

Video Swin Transformer

14 code implementations CVPR 2022 Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, Han Hu

The vision community is witnessing a modeling shift from CNNs to Transformers, where pure Transformer architectures have attained top accuracy on the major video recognition benchmarks.

Ranked #28 on Action Classification on Kinetics-600 (using extra training data)

Action Classification Action Recognition +5

End-to-End Semi-Supervised Object Detection with Soft Teacher

8 code implementations ICCV 2021 Mengde Xu, Zheng Zhang, Han Hu, JianFeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai, Zicheng Liu

This paper presents an end-to-end semi-supervised object detection approach, in contrast to previous more complex multi-stage methods.

Instance Segmentation object-detection +4

Aligning Pretraining for Detection via Object-Level Contrastive Learning

2 code implementations NeurIPS 2021 Fangyun Wei, Yue Gao, Zhirong Wu, Han Hu, Stephen Lin

Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning.

Contrastive Learning Object +6

TENSILE: A Tensor granularity dynamic GPU memory scheduling method toward multiple dynamic workloads system

no code implementations27 May 2021 Kaixin Zhang, Hongzhi Wang, Han Hu, Songling Zou, Jiye Qiu, Tongxin Li, Zhishun Wang

In this paper, we demonstrated TENSILE, a method of managing GPU memory in tensor granularity to reduce the GPU memory peak, considering the multiple dynamic workloads.

Management Scheduling

Group-Free 3D Object Detection via Transformers

4 code implementations ICCV 2021 Ze Liu, Zheng Zhang, Yue Cao, Han Hu, Xin Tong

Instead of grouping local points to each object candidate, our method computes the feature of an object from all the points in the point cloud with the help of an attention mechanism in the Transformers \cite{vaswani2017attention}, where the contribution of each point is automatically learned in the network training.

3D Object Detection Object +1

Capsule Network is Not More Robust than Convolutional Network

no code implementations CVPR 2021 Jindong Gu, Volker Tresp, Han Hu

The examination reveals five major new/different components in CapsNet: a transformation process, a dynamic routing layer, a squashing function, a marginal loss other than cross-entropy loss, and an additional class-conditional reconstruction loss for regularization.

Image Classification

Boosting Adversarial Transferability through Enhanced Momentum

1 code implementation19 Mar 2021 Xiaosen Wang, Jiadong Lin, Han Hu, Jingdong Wang, Kun He

Various momentum iterative gradient-based methods are shown to be effective to improve the adversarial transferability.

Adversarial Attack

Mobility-Aware Offloading and Resource Allocation in MEC-Enabled IoT Networks

no code implementations16 Mar 2021 Han Hu, Weiwei Song, Qun Wang, Fuhui Zhou, Rose Qingyang Hu

In this paper, the offloading decision and resource allocation problem is studied with mobility consideration.

Autonomous Driving Edge-computing

Secure and Energy-Efficient Offloading and Resource Allocation in a NOMA-Based MEC Network

no code implementations9 Feb 2021 Qun Wang, Han Hu, Haijian Sun, Rose Qingyang Hu

In this paper, we study the task offloading and resource allocation problem in a non-orthogonal multiple access (NOMA) assisted MEC network with security and energy efficiency considerations.

Edge-computing

Robustness of on-device Models: Adversarial Attack to Deep Learning Models on Android Apps

1 code implementation12 Jan 2021 Yujin Huang, Han Hu, Chunyang Chen

Deep learning has shown its power in many applications, including object detection in images, natural-language understanding, and speech recognition.

Adversarial Attack Image Classification +3

Leveraging Batch Normalization for Vision Transformers

no code implementations ICCVW 2021 Zhuliang Yao, Yue Cao, Yutong Lin, Ze Liu, Zheng Zhang, Han Hu

Transformer-based vision architectures have attracted great attention because of the strong performance over the convolutional neural networks (CNNs).

Global Context Networks

3 code implementations24 Dec 2020 Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu

The Non-Local Network (NLNet) presents a pioneering approach for capturing long-range dependencies within an image, via aggregating query-specific global context to each query position.

Instance Segmentation Object Detection

Evading Web Application Firewalls with Reinforcement Learning

no code implementations CUHK Course IERG5350 2020 Xianbo Wang, Han Hu

Our framework successfully discovered numbers of evasion payloads for each WAF in our experiments and can significantly outperform baseline policy.

OpenAI Gym reinforcement-learning +1

Depth-Enhanced Feature Pyramid Network for Occlusion-Aware Verification of Buildings from Oblique Images

no code implementations26 Nov 2020 Qing Zhu, Shengzhi Huang, Han Hu, Haifeng Li, Min Chen, Ruofei Zhong

Finally, multi-view information from both the nadir and oblique images is used in a robust voting procedure to label changes in existing buildings.

Joint Task Offloading and Resource Allocation for IoT Edge Computing with Sequential Task Dependency

no code implementations25 Nov 2020 Xuming An, Rongfei Fan, Han Hu, Ning Zhang, Saman Atapattu, Theodoros A. Tsiftsis

To solve this challenging problem, we decompose it as a one-dimensional search of task offloading decision problem and a non-convex optimization problem with task offloading decision given.

Edge-computing Information Theory Information Theory

Structure-Aware Completion of Photogrammetric Meshes in Urban Road Environment

1 code implementation23 Nov 2020 Qing Zhu, Qisen Shang, Han Hu, Haojia Yu, Ruofei Zhong

Finally, the completed rendered image is deintegrated to the original texture atlas and the triangles for the vehicles are also flattened for improved meshes.

object-detection Object Detection

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning

7 code implementations CVPR 2021 Zhenda Xie, Yutong Lin, Zheng Zhang, Yue Cao, Stephen Lin, Han Hu

We argue that the power of contrastive learning has yet to be fully unleashed, as current methods are trained only on instance-level pretext tasks, leading to representations that may be sub-optimal for downstream tasks requiring dense pixel predictions.

Contrastive Learning object-detection +3

RepPoints V2: Verification Meets Regression for Object Detection

1 code implementation NeurIPS 2020 Yihong Chen, Zheng Zhang, Yue Cao, Li-Wei Wang, Stephen Lin, Han Hu

Though RepPoints provides high performance, we find that its heavy reliance on regression for object localization leaves room for improvement.

Instance Segmentation Object +6

A Closer Look at Local Aggregation Operators in Point Cloud Analysis

1 code implementation ECCV 2020 Ze Liu, Han Hu, Yue Cao, Zheng Zhang, Xin Tong

Our investigation reveals that despite the different designs of these operators, all of these operators make surprisingly similar contributions to the network performance under the same network input and feature numbers and result in the state-of-the-art accuracy on standard benchmarks.

3D Semantic Segmentation

Disentangled Non-Local Neural Networks

5 code implementations ECCV 2020 Minghao Yin, Zhuliang Yao, Yue Cao, Xiu Li, Zheng Zhang, Stephen Lin, Han Hu

This paper first studies the non-local block in depth, where we find that its attention computation can be split into two terms, a whitened pairwise term accounting for the relationship between two pixels and a unary term representing the saliency of every pixel.

Ranked #20 on Semantic Segmentation on Cityscapes test (using extra training data)

Action Recognition object-detection +2

Ontology-based Interpretable Machine Learning for Textual Data

2 code implementations1 Apr 2020 Phung Lai, NhatHai Phan, Han Hu, Anuja Badeti, David Newman, Dejing Dou

In this paper, we introduce a novel interpreting framework that learns an interpretable model based on an ontology-based sampling technique to explain agnostic prediction models.

BIG-bench Machine Learning Interpretable Machine Learning

Memory Enhanced Global-Local Aggregation for Video Object Detection

2 code implementations CVPR 2020 Yihong Chen, Yue Cao, Han Hu, Li-Wei Wang

We argue that there are two important cues for humans to recognize objects in videos: the global semantic information and the local localization information.

Object object-detection +1

Deep Fusion of Local and Non-Local Features for Precision Landslide Recognition

1 code implementation20 Feb 2020 Qing Zhu, Lin Chen, Han Hu, Binzhi Xu, Yeting Zhang, Haifeng Li

The second uses a scale attention mechanism to guide the up-sampling of features from the coarse level by a learned weight map.

Semantic Segmentation

Fast and Regularized Reconstruction of Building Façades from Street-View Images using Binary Integer Programming

1 code implementation20 Feb 2020 Han Hu, Libin Wang, Mier Zhang, Yulin Ding, Qing Zhu

Regularized arrangement of primitives on building fa\c{c}ades to aligned locations and consistent sizes is important towards structured reconstruction of urban environment.

3D Reconstruction

Dense RepPoints: Representing Visual Objects with Dense Point Sets

2 code implementations ECCV 2020 Ze Yang, Yinghao Xu, Han Xue, Zheng Zhang, Raquel Urtasun, Li-Wei Wang, Stephen Lin, Han Hu

We present a new object representation, called Dense RepPoints, that utilizes a large set of points to describe an object at multiple levels, including both box level and pixel level.

Object Object Detection