Search Results for author: Lu Hou

Found 36 papers, 20 papers with code

UNIT: Unifying Image and Text Recognition in One Vision Encoder

no code implementations6 Sep 2024 Yi Zhu, Yanpeng Zhou, Chunwei Wang, Yang Cao, Jianhua Han, Lu Hou, Hang Xu

Starting with a vision encoder pre-trained with image recognition tasks, UNIT introduces a lightweight language decoder for predicting text outputs and a lightweight vision decoder to prevent catastrophic forgetting of the original image encoding capabilities.

Embedding Compression in Recommender Systems: A Survey

no code implementations5 Aug 2024 Shiwei Li, Huifeng Guo, Xing Tang, Ruiming Tang, Lu Hou, Ruixuan Li, Rui Zhang

In this survey, we provide a comprehensive review of embedding compression approaches in recommender systems.

Recommendation Systems

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

no code implementations11 Jul 2024 Runhui Huang, Xinpeng Ding, Chunwei Wang, Jianhua Han, Yulong Liu, Hengshuang Zhao, Hang Xu, Lu Hou, Wei zhang, Xiaodan Liang

High-resolution inputs enable Large Vision-Language Models (LVLMs) to discern finer visual details, enhancing their comprehension capabilities.

Position

DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models

1 code implementation31 May 2024 Linli Yao, Lei LI, Shuhuai Ren, Lean Wang, Yuanxin Liu, Xu sun, Lu Hou

Specifically, we trace back the semantic relevance flow from generated language tokens to raw visual encoder patches and the intermediate outputs produced by projectors.

cross-modal alignment Visual Localization +1

Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality

1 code implementation28 Mar 2024 Sishuo Chen, Lei LI, Shuhuai Ren, Rundong Gao, Yuanxin Liu, Xiaohan Bi, Xu sun, Lu Hou

Video paragraph captioning (VPC) involves generating detailed narratives for long videos, utilizing supportive modalities such as speech and event boundaries.

Data Augmentation Diversity +1

Visually Guided Generative Text-Layout Pre-training for Document Intelligence

1 code implementation25 Mar 2024 Zhiming Mao, Haoli Bai, Lu Hou, Jiansheng Wei, Xin Jiang, Qun Liu, Kam-Fai Wong

Prior study shows that pre-training techniques can boost the performance of visual document understanding (VDU), which typically requires models to gain abilities to perceive and reason both document texts and layouts (e. g., locations of texts and table-cells).

Document Classification document understanding +2

MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric

no code implementations CVPR 2024 Haokun Lin, Haoli Bai, Zhili Liu, Lu Hou, Muyi Sun, Linqi Song, Ying WEI, Zhenan Sun

We find that directly using smaller pre-trained models and applying magnitude-based pruning on CLIP models leads to inflexibility and inferior performance.

IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact

2 code implementations2 Mar 2024 Ruikang Liu, Haoli Bai, Haokun Lin, Yuening Li, Han Gao, Zhengzhuo Xu, Lu Hou, Jun Yao, Chun Yuan

Such outliers are found to allocate most of the attention scores on initial tokens of input, termed as pivot tokens, which are crucial to the performance of quantized LLMs.

Language Modelling Large Language Model +1

TempCompass: Do Video LLMs Really Understand Videos?

1 code implementation1 Mar 2024 Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei LI, Sishuo Chen, Xu sun, Lu Hou

Motivated by these two problems, we propose the \textbf{TempCompass} benchmark, which introduces a diversity of temporal aspects and task formats.

Diversity

Extending Context Window of Large Language Models via Semantic Compression

no code implementations15 Dec 2023 Weizhi Fei, Xueyan Niu, Pingyi Zhou, Lu Hou, Bo Bai, Lei Deng, Wei Han

Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses.

Few-Shot Learning Information Retrieval +4

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

2 code implementations CVPR 2024 Shuhuai Ren, Linli Yao, Shicheng Li, Xu sun, Lu Hou

This work proposes TimeChat, a time-sensitive multimodal large language model specifically designed for long video understanding.

Ranked #2 on Video-Text Retrieval on Test-of-Time (using extra training data)

Dense Captioning Highlight Detection +8

TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding

1 code implementation29 Oct 2023 Shuhuai Ren, Sishuo Chen, Shicheng Li, Xu sun, Lu Hou

TESTA can reduce the number of visual tokens by 75% and thus accelerate video encoding.

 Ranked #1 on Video Retrieval on Condensed Movies (using extra training data)

Language Modelling Retrieval +2

mCLIP: Multilingual CLIP via Cross-lingual Transfer

1 code implementation ACL 2023 Guanhua Chen, Lu Hou, Yun Chen, Wenliang Dai, Lifeng Shang, Xin Jiang, Qun Liu, Jia Pan, Wenping Wang

Furthermore, to enhance the token- and sentence-level multilingual representation of the MTE, we propose to train it with machine translation and contrastive learning jointly before the TriKD to provide a better initialization.

Contrastive Learning Cross-Lingual Transfer +7

CTRL: Connect Collaborative and Language Model for CTR Prediction

no code implementations5 Jun 2023 Xiangyang Li, Bo Chen, Lu Hou, Ruiming Tang

Both tabular data and converted textual data are regarded as two different modalities and are separately fed into the collaborative CTR model and pre-trained language model.

Click-Through Rate Prediction Language Modelling +1

Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding

1 code implementation19 Dec 2022 Haoli Bai, Zhiguang Liu, Xiaojun Meng, Wentao Li, Shuang Liu, Nian Xie, Rongfu Zheng, Liangwei Wang, Lu Hou, Jiansheng Wei, Xin Jiang, Qun Liu

While various vision-language pre-training objectives are studied in existing solutions, the document textline, as an intrinsic granularity in VDU, has seldom been explored so far.

Contrastive Learning document understanding +2

Adaptive Low-Precision Training for Embeddings in Click-Through Rate Prediction

no code implementations12 Dec 2022 Shiwei Li, Huifeng Guo, Lu Hou, Wei zhang, Xing Tang, Ruiming Tang, Rui Zhang, Ruixuan Li

To this end, we formulate a novel quantization training paradigm to compress the embeddings from the training stage, termed low-precision training (LPT).

Click-Through Rate Prediction Quantization

Compression of Generative Pre-trained Language Models via Quantization

no code implementations ACL 2022 Chaofan Tao, Lu Hou, Wei zhang, Lifeng Shang, Xin Jiang, Qun Liu, Ping Luo, Ngai Wong

We find that previous quantization methods fail on generative tasks due to the \textit{homogeneous word embeddings} caused by reduced capacity, and \textit{varied distribution of weights}.

Model Compression Quantization +1

Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation

no code implementations Findings (ACL) 2022 Wenliang Dai, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Pascale Fung

Furthermore, the original textual language understanding and generation ability of the PLM is maintained after VLKD, which makes our model versatile for both multimodal and unimodal tasks.

Image Captioning Knowledge Distillation +4

FILIP: Fine-grained Interactive Language-Image Pre-Training

1 code implementation ICLR 2022 Lewei Yao, Runhui Huang, Lu Hou, Guansong Lu, Minzhe Niu, Hang Xu, Xiaodan Liang, Zhenguo Li, Xin Jiang, Chunjing Xu

In this paper, we introduce a large-scale Fine-grained Interactive Language-Image Pre-training (FILIP) to achieve finer-level alignment through a cross-modal late interaction mechanism, which uses a token-wise maximum similarity between visual and textual tokens to guide the contrastive objective.

Image Classification Image-text Retrieval +2

Towards Efficient Post-training Quantization of Pre-trained Language Models

no code implementations30 Sep 2021 Haoli Bai, Lu Hou, Lifeng Shang, Xin Jiang, Irwin King, Michael R. Lyu

Experiments on GLUE and SQuAD benchmarks show that our proposed PTQ solution not only performs close to QAT, but also enjoys significant reductions in training time, memory overhead, and data consumption.

Quantization

GhostBERT: Generate More Features with Cheap Operations for BERT

no code implementations ACL 2021 Zhiqi Huang, Lu Hou, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu

Transformer-based pre-trained language models like BERT, though powerful in many tasks, are expensive in both memory and computation, due to their large number of parameters.

Improved OOD Generalization via Adversarial Training and Pre-training

no code implementations24 May 2021 Mingyang Yi, Lu Hou, Jiacheng Sun, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma

In this paper, after defining OOD generalization via Wasserstein distance, we theoretically show that a model robust to input perturbation generalizes well on OOD data.

Image Classification Natural Language Understanding

Reweighting Augmented Samples by Minimizing the Maximal Expected Loss

1 code implementation ICLR 2021 Mingyang Yi, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma

Inspired by adversarial training, we minimize this maximal expected loss (MMEL) and obtain a simple and interpretable closed-form solution: more attention should be paid to augmented samples with large loss values (i. e., harder examples).

Image Augmentation Image Classification +1

TernaryBERT: Distillation-aware Ultra-low Bit BERT

5 code implementations EMNLP 2020 Wei Zhang, Lu Hou, Yichun Yin, Lifeng Shang, Xiao Chen, Xin Jiang, Qun Liu

Transformer-based pre-training models like BERT have achieved remarkable performance in many natural language processing tasks. However, these models are both computation and memory expensive, hindering their deployment to resource-constrained devices.

Knowledge Distillation Quantization

DynaBERT: Dynamic BERT with Adaptive Width and Depth

3 code implementations NeurIPS 2020 Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu

The pre-trained language models like BERT, though powerful in many natural language processing tasks, are both computation and memory expensive.

Language Modelling

Normalization Helps Training of Quantized LSTM

1 code implementation NeurIPS 2019 Lu Hou, Jinhua Zhu, James Kwok, Fei Gao, Tao Qin, Tie-Yan Liu

The long-short-term memory (LSTM), though powerful, is memory and computa\x02tion expensive.

Quantization

Analysis of Quantized Models

no code implementations ICLR 2019 Lu Hou, Ruiliang Zhang, James T. Kwok

We show that (i) weight-quantized networks converge to an error related to the weight quantization resolution and weight dimension; (ii) quantizing gradients slows convergence by a factor related to the gradient quantization resolution and dimension; and (iii) clipping the gradient before quantization renders this factor dimension-free, thus allowing the use of fewer bits for gradient quantization.

Quantization

Power Law in Sparsified Deep Neural Networks

no code implementations4 May 2018 Lu Hou, James T. Kwok

The power law has been observed in the degree distributions of many biological neural networks.

Continual Learning

Loss-aware Binarization of Deep Networks

1 code implementation5 Nov 2016 Lu Hou, Quanming Yao, James T. Kwok

Deep neural network models, though very powerful and highly successful, are computationally expensive in terms of space and time.

Binarization

Cannot find the paper you are looking for? You can Submit a new open access paper.