no code implementations • 6 Sep 2024 • Yi Zhu, Yanpeng Zhou, Chunwei Wang, Yang Cao, Jianhua Han, Lu Hou, Hang Xu
Starting with a vision encoder pre-trained with image recognition tasks, UNIT introduces a lightweight language decoder for predicting text outputs and a lightweight vision decoder to prevent catastrophic forgetting of the original image encoding capabilities.
no code implementations • 5 Aug 2024 • Shiwei Li, Huifeng Guo, Xing Tang, Ruiming Tang, Lu Hou, Ruixuan Li, Rui Zhang
In this survey, we provide a comprehensive review of embedding compression approaches in recommender systems.
no code implementations • 11 Jul 2024 • Runhui Huang, Xinpeng Ding, Chunwei Wang, Jianhua Han, Yulong Liu, Hengshuang Zhao, Hang Xu, Lu Hou, Wei zhang, Xiaodan Liang
High-resolution inputs enable Large Vision-Language Models (LVLMs) to discern finer visual details, enhancing their comprehension capabilities.
1 code implementation • 31 May 2024 • Linli Yao, Lei LI, Shuhuai Ren, Lean Wang, Yuanxin Liu, Xu sun, Lu Hou
Specifically, we trace back the semantic relevance flow from generated language tokens to raw visual encoder patches and the intermediate outputs produced by projectors.
no code implementations • 23 May 2024 • Ali Edalati, Alireza Ghaffari, Masoud Asgharian, Lu Hou, Boxing Chen, Vahid Partovi Nia
The Hessian is also used for detecting the most salient weights to quantization.
1 code implementation • 28 Mar 2024 • Sishuo Chen, Lei LI, Shuhuai Ren, Rundong Gao, Yuanxin Liu, Xiaohan Bi, Xu sun, Lu Hou
Video paragraph captioning (VPC) involves generating detailed narratives for long videos, utilizing supportive modalities such as speech and event boundaries.
1 code implementation • 25 Mar 2024 • Zhiming Mao, Haoli Bai, Lu Hou, Jiansheng Wei, Xin Jiang, Qun Liu, Kam-Fai Wong
Prior study shows that pre-training techniques can boost the performance of visual document understanding (VDU), which typically requires models to gain abilities to perceive and reason both document texts and layouts (e. g., locations of texts and table-cells).
no code implementations • CVPR 2024 • Haokun Lin, Haoli Bai, Zhili Liu, Lu Hou, Muyi Sun, Linqi Song, Ying WEI, Zhenan Sun
We find that directly using smaller pre-trained models and applying magnitude-based pruning on CLIP models leads to inflexibility and inferior performance.
2 code implementations • 2 Mar 2024 • Ruikang Liu, Haoli Bai, Haokun Lin, Yuening Li, Han Gao, Zhengzhuo Xu, Lu Hou, Jun Yao, Chun Yuan
Such outliers are found to allocate most of the attention scores on initial tokens of input, termed as pivot tokens, which are crucial to the performance of quantized LLMs.
1 code implementation • 1 Mar 2024 • Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei LI, Sishuo Chen, Xu sun, Lu Hou
Motivated by these two problems, we propose the \textbf{TempCompass} benchmark, which introduces a diversity of temporal aspects and task formats.
no code implementations • 15 Dec 2023 • Weizhi Fei, Xueyan Niu, Pingyi Zhou, Lu Hou, Bo Bai, Lei Deng, Wei Han
Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses.
2 code implementations • CVPR 2024 • Shuhuai Ren, Linli Yao, Shicheng Li, Xu sun, Lu Hou
This work proposes TimeChat, a time-sensitive multimodal large language model specifically designed for long video understanding.
Ranked #2 on Video-Text Retrieval on Test-of-Time (using extra training data)
1 code implementation • 29 Nov 2023 • Shicheng Li, Lei LI, Shuhuai Ren, Yuanxin Liu, Yi Liu, Rundong Gao, Xu sun, Lu Hou
The ability to perceive how objects change over time is a crucial ingredient in human intelligence.
1 code implementation • NeurIPS 2023 • Yuanxin Liu, Lei LI, Shuhuai Ren, Rundong Gao, Shicheng Li, Sishuo Chen, Xu sun, Lu Hou
The multi-aspect categorization of FETV enables fine-grained analysis of the metrics' reliability in different scenarios.
1 code implementation • 29 Oct 2023 • Shuhuai Ren, Sishuo Chen, Shicheng Li, Xu sun, Lu Hou
TESTA can reduce the number of visual tokens by 75% and thus accelerate video encoding.
Ranked #1 on Video Retrieval on Condensed Movies (using extra training data)
1 code implementation • ACL 2023 • Guanhua Chen, Lu Hou, Yun Chen, Wenliang Dai, Lifeng Shang, Xin Jiang, Qun Liu, Jia Pan, Wenping Wang
Furthermore, to enhance the token- and sentence-level multilingual representation of the MTE, we propose to train it with machine translation and contrastive learning jointly before the TriKD to provide a better initialization.
no code implementations • 5 Jun 2023 • Xiangyang Li, Bo Chen, Lu Hou, Ruiming Tang
Both tabular data and converted textual data are regarded as two different modalities and are separately fed into the collaborative CTR model and pre-trained language model.
1 code implementation • 19 Dec 2022 • Haoli Bai, Zhiguang Liu, Xiaojun Meng, Wentao Li, Shuang Liu, Nian Xie, Rongfu Zheng, Liangwei Wang, Lu Hou, Jiansheng Wei, Xin Jiang, Qun Liu
While various vision-language pre-training objectives are studied in existing solutions, the document textline, as an intrinsic granularity in VDU, has seldom been explored so far.
no code implementations • 12 Dec 2022 • Shiwei Li, Huifeng Guo, Lu Hou, Wei zhang, Xing Tang, Ruiming Tang, Rui Zhang, Ruixuan Li
To this end, we formulate a novel quantization training paradigm to compress the embeddings from the training stage, termed low-precision training (LPT).
no code implementations • 21 Oct 2022 • Dongsheng Chen, Chaofan Tao, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu
Recent large-scale video-language pre-trained models have shown appealing performance on various downstream tasks.
no code implementations • ACL 2022 • Chaofan Tao, Lu Hou, Wei zhang, Lifeng Shang, Xin Jiang, Qun Liu, Ping Luo, Ngai Wong
We find that previous quantization methods fail on generative tasks due to the \textit{homogeneous word embeddings} caused by reduced capacity, and \textit{varied distribution of weights}.
no code implementations • Findings (ACL) 2022 • Wenliang Dai, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Pascale Fung
Furthermore, the original textual language understanding and generation ability of the PLM is maintained after VLKD, which makes our model versatile for both multimodal and unimodal tasks.
1 code implementation • 14 Feb 2022 • Jiaxi Gu, Xiaojun Meng, Guansong Lu, Lu Hou, Minzhe Niu, Xiaodan Liang, Lewei Yao, Runhui Huang, Wei zhang, Xin Jiang, Chunjing Xu, Hang Xu
Experiments show that Wukong can serve as a promising Chinese pre-training dataset and benchmark for different cross-modal learning methods.
Ranked #6 on Image Retrieval on MUGE Retrieval
1 code implementation • ICLR 2022 • Lewei Yao, Runhui Huang, Lu Hou, Guansong Lu, Minzhe Niu, Hang Xu, Xiaodan Liang, Zhenguo Li, Xin Jiang, Chunjing Xu
In this paper, we introduce a large-scale Fine-grained Interactive Language-Image Pre-training (FILIP) to achieve finer-level alignment through a cross-modal late interaction mechanism, which uses a token-wise maximum similarity between visual and textual tokens to guide the contrastive objective.
no code implementations • 30 Sep 2021 • Haoli Bai, Lu Hou, Lifeng Shang, Xin Jiang, Irwin King, Michael R. Lyu
Experiments on GLUE and SQuAD benchmarks show that our proposed PTQ solution not only performs close to QAT, but also enjoys significant reductions in training time, memory overhead, and data consumption.
no code implementations • ACL 2021 • Zhiqi Huang, Lu Hou, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu
Transformer-based pre-trained language models like BERT, though powerful in many tasks, are expensive in both memory and computation, due to their large number of parameters.
no code implementations • 24 May 2021 • Mingyang Yi, Lu Hou, Jiacheng Sun, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma
In this paper, after defining OOD generalization via Wasserstein distance, we theoretically show that a model robust to input perturbation generalizes well on OOD data.
1 code implementation • ICLR 2021 • Mingyang Yi, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma
Inspired by adversarial training, we minimize this maximal expected loss (MMEL) and obtain a simple and interpretable closed-form solution: more attention should be paid to augmented samples with large loss values (i. e., harder examples).
1 code implementation • ACL 2021 • Haoli Bai, Wei zhang, Lu Hou, Lifeng Shang, Jing Jin, Xin Jiang, Qun Liu, Michael Lyu, Irwin King
In this paper, we propose BinaryBERT, which pushes BERT quantization to the limit by weight binarization.
5 code implementations • EMNLP 2020 • Wei Zhang, Lu Hou, Yichun Yin, Lifeng Shang, Xiao Chen, Xin Jiang, Qun Liu
Transformer-based pre-training models like BERT have achieved remarkable performance in many natural language processing tasks. However, these models are both computation and memory expensive, hindering their deployment to resource-constrained devices.
3 code implementations • NeurIPS 2020 • Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu
The pre-trained language models like BERT, though powerful in many natural language processing tasks, are both computation and memory expensive.
1 code implementation • NeurIPS 2019 • Lu Hou, Jinhua Zhu, James Kwok, Fei Gao, Tao Qin, Tie-Yan Liu
The long-short-term memory (LSTM), though powerful, is memory and computa\x02tion expensive.
no code implementations • ICLR 2019 • Lu Hou, Ruiliang Zhang, James T. Kwok
We show that (i) weight-quantized networks converge to an error related to the weight quantization resolution and weight dimension; (ii) quantizing gradients slows convergence by a factor related to the gradient quantization resolution and dimension; and (iii) clipping the gradient before quantization renders this factor dimension-free, thus allowing the use of fewer bits for gradient quantization.
no code implementations • 4 May 2018 • Lu Hou, James T. Kwok
The power law has been observed in the degree distributions of many biological neural networks.
1 code implementation • ICLR 2018 • Lu Hou, James T. Kwok
The huge size of deep networks hinders their use in small computing devices.
1 code implementation • 5 Nov 2016 • Lu Hou, Quanming Yao, James T. Kwok
Deep neural network models, though very powerful and highly successful, are computationally expensive in terms of space and time.