Search Results for author: Hailin Hu

Found 10 papers, 3 papers with code

A Survey on Transformer Compression

no code implementations5 Feb 2024 Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhijun Tu, Kai Han, Hailin Hu, DaCheng Tao

Model compression methods reduce the memory and computational cost of Transformer, which is a necessary step to implement large language/vision models on practical devices.

Knowledge Distillation Model Compression +1

PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity Compensation

no code implementations27 Dec 2023 Yunhe Wang, Hanting Chen, Yehui Tang, Tianyu Guo, Kai Han, Ying Nie, Xutao Wang, Hailin Hu, Zheyuan Bai, Yun Wang, Fangcheng Liu, Zhicheng Liu, Jianyuan Guo, Sinan Zeng, Yinchen Zhang, Qinghua Xu, Qun Liu, Jun Yao, Chao Xu, DaCheng Tao

We then demonstrate that the proposed approach is significantly effective for enhancing the model nonlinearity through carefully designed ablations; thus, we present a new efficient model architecture for establishing modern, namely, PanGu-$\pi$.

Language Modelling

GenDet: Towards Good Generalizations for AI-Generated Image Detection

1 code implementation12 Dec 2023 Mingjian Zhu, Hanting Chen, Mouxiao Huang, Wei Li, Hailin Hu, Jie Hu, Yunhe Wang

The misuse of AI imagery can have harmful societal effects, prompting the creation of detectors to combat issues like the spread of fake news.

Anomaly Detection

Data-Free Distillation of Language Model by Text-to-Text Transfer

no code implementations3 Nov 2023 Zheyuan Bai, Xinduo Liu, Hailin Hu, Tianyu Guo, Qinghua Zhang, Yunhe Wang

Data-Free Knowledge Distillation (DFKD) plays a vital role in compressing the model when original training data is unavailable.

Data-free Knowledge Distillation Language Modelling +4

Less is More: Focus Attention for Efficient DETR

4 code implementations ICCV 2023 Dehua Zheng, Wenhui Dong, Hailin Hu, Xinghao Chen, Yunhe Wang

DETR-like models have significantly boosted the performance of detectors and even outperformed classical convolutional models.

Finding One Missing Puzzle of Contextual Word Embedding: Representing Contexts as Manifold

no code implementations29 Sep 2021 Hailin Hu, Rong Yao, Cheng Li

The current understanding of contextual word embedding interprets the representation by associating each token to a vector that is dynamically modulated by the context.

How Well Does Self-Supervised Pre-Training Perform with Streaming ImageNet?

no code implementations NeurIPS Workshop ImageNet_PPF 2021 Dapeng Hu, Shipeng Yan, Qizhengqiu Lu, Lanqing Hong, Hailin Hu, Yifan Zhang, Zhenguo Li, Xinchao Wang, Jiashi Feng

Prior works on self-supervised pre-training focus on the joint training scenario, where massive unlabeled data are assumed to be given as input all at once, and only then is a learner trained.

Self-Supervised Learning

How Well Does Self-Supervised Pre-Training Perform with Streaming Data?

no code implementations ICLR 2022 Dapeng Hu, Shipeng Yan, Qizhengqiu Lu, Lanqing Hong, Hailin Hu, Yifan Zhang, Zhenguo Li, Xinchao Wang, Jiashi Feng

Prior works on self-supervised pre-training focus on the joint training scenario, where massive unlabeled data are assumed to be given as input all at once, and only then is a learner trained.

Representation Learning Self-Supervised Learning

DATSING: Data Augmented Time Series Forecasting with Adversarial Domain Adaptation

no code implementations ACM International Conference on Information & Knowledge Management 2020 Hailin Hu, Mingjian Tang, Chengcheng Bai

In this work, we have developed, DATSING, a transfer learning-based framework that effectively leverages cross-domain time series latent representations to augment target domain forecasting.

Domain Adaptation Time Series +2

DensE: An Enhanced Non-commutative Representation for Knowledge Graph Embedding with Adaptive Semantic Hierarchy

1 code implementation11 Aug 2020 Haonan Lu, Hailin Hu, Xiaodong Lin

This design principle leads to several advantages of our method: (1) For composite relations, the corresponding diagonal relation matrices can be non-commutative, reflecting a predominant scenario in real world applications; (2) Our model preserves the natural interaction between relational operations and entity embeddings; (3) The scaling operation provides the modeling power for the intrinsic semantic hierarchical structure of entities; (4) The enhanced expressiveness of DensE is achieved with high computational efficiency in terms of both parameter size and training time; and (5) Modeling entities in Euclidean space instead of quaternion space keeps the direct geometrical interpretations of relational patterns.

Computational Efficiency Entity Embeddings +2

Cannot find the paper you are looking for? You can Submit a new open access paper.