Search Results for author: Lu Hou

Found 20 papers, 7 papers with code

CTRL: Connect Tabular and Language Model for CTR Prediction

no code implementations5 Jun 2023 Xiangyang Li, Bo Chen, Lu Hou, Ruiming Tang

Both tabular data and converted textual data are regarded as two different modalities and are separately fed into the collaborative CTR model and pre-trained language model.

Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding

no code implementations19 Dec 2022 Haoli Bai, Zhiguang Liu, Xiaojun Meng, Wentao Li, Shuang Liu, Nian Xie, Rongfu Zheng, Liangwei Wang, Lu Hou, Jiansheng Wei, Xin Jiang, Qun Liu

While various vision-language pre-training objectives are studied in existing solutions, the document textline, as an intrinsic granularity in VDU, has seldom been explored so far.

Contrastive Learning Optical Character Recognition (OCR) +1

Adaptive Low-Precision Training for Embeddings in Click-Through Rate Prediction

no code implementations12 Dec 2022 Shiwei Li, Huifeng Guo, Lu Hou, Wei zhang, Xing Tang, Ruiming Tang, Rui Zhang, Ruixuan Li

To this end, we formulate a novel quantization training paradigm to compress the embeddings from the training stage, termed low-precision training (LPT).

Click-Through Rate Prediction Quantization

Compression of Generative Pre-trained Language Models via Quantization

no code implementations ACL 2022 Chaofan Tao, Lu Hou, Wei zhang, Lifeng Shang, Xin Jiang, Qun Liu, Ping Luo, Ngai Wong

We find that previous quantization methods fail on generative tasks due to the \textit{homogeneous word embeddings} caused by reduced capacity, and \textit{varied distribution of weights}.

Model Compression Quantization +1

Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation

no code implementations Findings (ACL) 2022 Wenliang Dai, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Pascale Fung

Furthermore, the original textual language understanding and generation ability of the PLM is maintained after VLKD, which makes our model versatile for both multimodal and unimodal tasks.

Image Captioning Knowledge Distillation +4

FILIP: Fine-grained Interactive Language-Image Pre-Training

no code implementations ICLR 2022 Lewei Yao, Runhui Huang, Lu Hou, Guansong Lu, Minzhe Niu, Hang Xu, Xiaodan Liang, Zhenguo Li, Xin Jiang, Chunjing Xu

In this paper, we introduce a large-scale Fine-grained Interactive Language-Image Pre-training (FILIP) to achieve finer-level alignment through a cross-modal late interaction mechanism, which uses a token-wise maximum similarity between visual and textual tokens to guide the contrastive objective.

Image Classification Retrieval +2

Towards Efficient Post-training Quantization of Pre-trained Language Models

no code implementations30 Sep 2021 Haoli Bai, Lu Hou, Lifeng Shang, Xin Jiang, Irwin King, Michael R. Lyu

Experiments on GLUE and SQuAD benchmarks show that our proposed PTQ solution not only performs close to QAT, but also enjoys significant reductions in training time, memory overhead, and data consumption.


GhostBERT: Generate More Features with Cheap Operations for BERT

no code implementations ACL 2021 Zhiqi Huang, Lu Hou, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu

Transformer-based pre-trained language models like BERT, though powerful in many tasks, are expensive in both memory and computation, due to their large number of parameters.

Improved OOD Generalization via Adversarial Training and Pre-training

no code implementations24 May 2021 Mingyang Yi, Lu Hou, Jiacheng Sun, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma

In this paper, after defining OOD generalization via Wasserstein distance, we theoretically show that a model robust to input perturbation generalizes well on OOD data.

Image Classification Natural Language Understanding

Reweighting Augmented Samples by Minimizing the Maximal Expected Loss

no code implementations ICLR 2021 Mingyang Yi, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma

Inspired by adversarial training, we minimize this maximal expected loss (MMEL) and obtain a simple and interpretable closed-form solution: more attention should be paid to augmented samples with large loss values (i. e., harder examples).

Image Augmentation Image Classification +1

TernaryBERT: Distillation-aware Ultra-low Bit BERT

2 code implementations EMNLP 2020 Wei Zhang, Lu Hou, Yichun Yin, Lifeng Shang, Xiao Chen, Xin Jiang, Qun Liu

Transformer-based pre-training models like BERT have achieved remarkable performance in many natural language processing tasks. However, these models are both computation and memory expensive, hindering their deployment to resource-constrained devices.

Knowledge Distillation Quantization

DynaBERT: Dynamic BERT with Adaptive Width and Depth

3 code implementations NeurIPS 2020 Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu

The pre-trained language models like BERT, though powerful in many natural language processing tasks, are both computation and memory expensive.

Language Modelling

Normalization Helps Training of Quantized LSTM

1 code implementation NeurIPS 2019 Lu Hou, Jinhua Zhu, James Kwok, Fei Gao, Tao Qin, Tie-Yan Liu

The long-short-term memory (LSTM), though powerful, is memory and computa\x02tion expensive.


Analysis of Quantized Models

no code implementations ICLR 2019 Lu Hou, Ruiliang Zhang, James T. Kwok

We show that (i) weight-quantized networks converge to an error related to the weight quantization resolution and weight dimension; (ii) quantizing gradients slows convergence by a factor related to the gradient quantization resolution and dimension; and (iii) clipping the gradient before quantization renders this factor dimension-free, thus allowing the use of fewer bits for gradient quantization.


Power Law in Sparsified Deep Neural Networks

no code implementations4 May 2018 Lu Hou, James T. Kwok

The power law has been observed in the degree distributions of many biological neural networks.

Continual Learning

Loss-aware Binarization of Deep Networks

1 code implementation5 Nov 2016 Lu Hou, Quanming Yao, James T. Kwok

Deep neural network models, though very powerful and highly successful, are computationally expensive in terms of space and time.


Cannot find the paper you are looking for? You can Submit a new open access paper.