Search Results for author: Zhen Qin

Found 66 papers, 23 papers with code

Computational and Statistical Guarantees for Tensor-on-Tensor Regression with Tensor Train Decomposition

no code implementations10 Jun 2024 Zhen Qin, Zhihui Zhu

However, the exponential growth in tensor complexity poses challenges for storage and computation in ToT regression.

Computational Efficiency regression

You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet

no code implementations31 May 2024 Zhen Qin, Yuxin Mao, Xuyang Shen, Dong Li, Jing Zhang, Yuchao Dai, Yiran Zhong

Linear attention mechanisms have gained prominence in causal language models due to their linear computational complexity and enhanced speed.

Image Classification Image Generation +1

TAVGBench: Benchmarking Text to Audible-Video Generation

1 code implementation22 Apr 2024 Yuxin Mao, Xuyang Shen, Jing Zhang, Zhen Qin, Jinxing Zhou, Mochu Xiang, Yiran Zhong, Yuchao Dai

To support research in this field, we have developed a comprehensive Text to Audible-Video Generation Benchmark (TAVGBench), which contains over 1. 7 million clips with a total duration of 11. 8 thousand hours.

Benchmarking Contrastive Learning +1

HGRN2: Gated Linear RNNs with State Expansion

2 code implementations11 Apr 2024 Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong

Hierarchically gated linear RNN (HGRN, Qin et al. 2023) has demonstrated competitive training speed and performance in language modeling, while offering efficient inference.

Image Classification Language Modelling

Linear Attention Sequence Parallelism

1 code implementation3 Apr 2024 Weigao Sun, Zhen Qin, Dong Li, Xuyang Shen, Yu Qiao, Yiran Zhong

In this paper, we introduce Linear Attention Sequence Parallel (LASP), an efficient SP method tailored to linear attention-based language models.

LiPO: Listwise Preference Optimization through Learning-to-Rank

1 code implementation2 Feb 2024 Tianqi Liu, Zhen Qin, Junru Wu, Jiaming Shen, Misha Khalman, Rishabh Joshi, Yao Zhao, Mohammad Saleh, Simon Baumgartner, Jialu Liu, Peter J. Liu, Xuanhui Wang

In this work, we formulate the LM alignment as a \textit{listwise} ranking problem and describe the LiPO framework, where the policy can potentially learn more effectively from a ranked list of plausible responses given the prompt.

Learning-To-Rank

CO2: Efficient Distributed Training with Full Communication-Computation Overlap

1 code implementation29 Jan 2024 Weigao Sun, Zhen Qin, Weixuan Sun, Shidi Li, Dong Li, Xuyang Shen, Yu Qiao, Yiran Zhong

CO2 is able to attain a high scalability even on extensive multi-node clusters constrained by very limited communication bandwidth.

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

2 code implementations9 Jan 2024 Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong

With its ability to process tokens in linear computational complexities, linear attention, in theory, can handle sequences of unlimited length without sacrificing speed, i. e., maintaining a constant training speed for various sequence lengths with a fixed memory consumption.

Guaranteed Nonconvex Factorization Approach for Tensor Train Recovery

no code implementations5 Jan 2024 Zhen Qin, Michael B. Wakin, Zhihui Zhu

We first delve into the TT factorization problem and establish the local linear convergence of RGD.

Convergence Analysis for Learning Orthonormal Deep Linear Neural Networks

no code implementations24 Nov 2023 Zhen Qin, Xuwei Tan, Zhihui Zhu

Enforcing orthonormal or isometric property for the weight matrices has been shown to enhance the training of deep neural networks by mitigating gradient exploding/vanishing and increasing the robustness of the learned networks.

Can Query Expansion Improve Generalization of Strong Cross-Encoder Rankers?

no code implementations15 Nov 2023 Minghan Li, Honglei Zhuang, Kai Hui, Zhen Qin, Jimmy Lin, Rolf Jagerman, Xuanhui Wang, Michael Bendersky

In this paper, we re-examine this conclusion and raise the following question: Can query expansion improve generalization of strong cross-encoder rankers?

Instruction Following Language Modelling +2

Accelerating Toeplitz Neural Network with Constant-time Inference Complexity

1 code implementation15 Nov 2023 Zhen Qin, Yiran Zhong

On the other hand, State Space Models (SSMs) achieve lower performance than TNNs in language modeling but offer the advantage of constant inference complexity.

Language Modelling

On What Basis? Predicting Text Preference Via Structured Comparative Reasoning

no code implementations14 Nov 2023 Jing Nathan Yan, Tianqi Liu, Justin T Chiu, Jiaming Shen, Zhen Qin, Yue Yu, Yao Zhao, Charu Lakshmanan, Yair Kurzion, Alexander M. Rush, Jialu Liu, Michael Bendersky

Comparative reasoning plays a crucial role in text preference prediction; however, large language models (LLMs) often demonstrate inconsistencies in their reasoning.

Hallucination Retrieval

Explanation-aware Soft Ensemble Empowers Large Language Model In-context Learning

no code implementations13 Nov 2023 Yue Yu, Jiaming Shen, Tianqi Liu, Zhen Qin, Jing Nathan Yan, Jialu Liu, Chao Zhang, Michael Bendersky

To fully unleash the power of explanations, we propose EASE, an Explanation-Aware Soft Ensemble framework to empower in-context learning with LLMs.

In-Context Learning Language Modelling +2

Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization

no code implementations24 Oct 2023 Zhen Qin, Zhishuai Liu, Pan Xu

Yet, existing analyses of signSGD rely on assuming that data are sampled with replacement in each iteration, contradicting the practical implementation where data are randomly reshuffled and sequentially fed into the algorithm.

PaRaDe: Passage Ranking using Demonstrations with Large Language Models

no code implementations22 Oct 2023 Andrew Drozdov, Honglei Zhuang, Zhuyun Dai, Zhen Qin, Razieh Rahimi, Xuanhui Wang, Dana Alon, Mohit Iyyer, Andrew McCallum, Donald Metzler, Kai Hui

Recent studies show that large language models (LLMs) can be instructed to effectively perform zero-shot passage re-ranking, in which the results of a first stage retrieval method, such as BM25, are rated and reordered to improve relevance.

Passage Ranking Passage Re-Ranking +6

Beyond Yes and No: Improving Zero-Shot LLM Rankers via Scoring Fine-Grained Relevance Labels

no code implementations21 Oct 2023 Honglei Zhuang, Zhen Qin, Kai Hui, Junru Wu, Le Yan, Xuanhui Wang, Michael Bendersky

We propose to incorporate fine-grained relevance labels into the prompt for LLM rankers, enabling them to better differentiate among documents with different levels of relevance to the query and thus derive a more accurate ranking.

Resisting Backdoor Attacks in Federated Learning via Bidirectional Elections and Individual Perspective

1 code implementation28 Sep 2023 Zhen Qin, Feiyi Chen, Chen Zhi, Xueqiang Yan, Shuiguang Deng

Existing approaches defend against backdoor attacks in federated learning (FL) mainly through a) mitigating the impact of infected models, or b) excluding infected models.

Federated Learning

All-pairs Consistency Learning for Weakly Supervised Semantic Segmentation

1 code implementation8 Aug 2023 Weixuan Sun, Yanhao Zhang, Zhen Qin, Zheyuan Liu, Lin Cheng, Fanyi Wang, Yiran Zhong, Nick Barnes

Given a pair of augmented views, our approach regularizes the activation intensities between a pair of augmented views, while also ensuring that the affinity across regions within each view remains consistent.

Object Localization Weakly supervised Semantic Segmentation +1

TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer

2 code implementations27 Jul 2023 Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen, Xiaodong Han, Yunshen Wei, Baohong Lv, Xiao Luo, Yu Qiao, Yiran Zhong

TransNormerLLM evolves from the previous linear attention architecture TransNormer by making advanced modifications that include positional embedding, linear attention acceleration, gating mechanisms, tensor normalization, and inference acceleration and stabilization.

Language Modelling Large Language Model

Exploring Transformer Extrapolation

no code implementations19 Jul 2023 Zhen Qin, Yiran Zhong, Hui Deng

While these methods perform well on a variety of corpora, the conditions for length extrapolation have yet to be investigated.

Language Modelling

Linearized Relative Positional Encoding

no code implementations18 Jul 2023 Zhen Qin, Weixuan Sun, Kaiyue Lu, Hui Deng, Dongxu Li, Xiaodong Han, Yuchao Dai, Lingpeng Kong, Yiran Zhong

Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers.

Image Classification Language Modelling +2

Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting

no code implementations30 Jun 2023 Zhen Qin, Rolf Jagerman, Kai Hui, Honglei Zhuang, Junru Wu, Le Yan, Jiaming Shen, Tianqi Liu, Jialu Liu, Donald Metzler, Xuanhui Wang, Michael Bendersky

Ranking documents using Large Language Models (LLMs) by directly feeding the query and candidate documents into the prompt is an interesting and practical problem.

Learning to Rank when Grades Matter

no code implementations14 Jun 2023 Le Yan, Zhen Qin, Gil Shamir, Dong Lin, Xuanhui Wang, Mike Bendersky

In this paper, we conduct a rigorous study of learning to rank with grades, where both ranking performance and grade prediction performance are important.

Learning-To-Rank

Toeplitz Neural Network for Sequence Modeling

2 code implementations8 May 2023 Zhen Qin, Xiaodong Han, Weixuan Sun, Bowen He, Dong Li, Dongxu Li, Yuchao Dai, Lingpeng Kong, Yiran Zhong

Sequence modeling has important applications in natural language processing and computer vision.

Language Modelling Position

Query Expansion by Prompting Large Language Models

no code implementations5 May 2023 Rolf Jagerman, Honglei Zhuang, Zhen Qin, Xuanhui Wang, Michael Bendersky

Query expansion is a widely used technique to improve the recall of search systems.

Towards Disentangling Relevance and Bias in Unbiased Learning to Rank

no code implementations28 Dec 2022 Yunan Zhang, Le Yan, Zhen Qin, Honglei Zhuang, Jiaming Shen, Xuanhui Wang, Michael Bendersky, Marc Najork

We give both theoretical analysis and empirical results to show the negative effects on relevance tower due to such a correlation.

Learning-To-Rank

Regression Compatible Listwise Objectives for Calibrated Ranking with Binary Relevance

no code implementations2 Nov 2022 Aijun Bai, Rolf Jagerman, Zhen Qin, Le Yan, Pratyush Kar, Bing-Rong Lin, Xuanhui Wang, Michael Bendersky, Marc Najork

As Learning-to-Rank (LTR) approaches primarily seek to improve ranking quality, their output scores are not scale-calibrated by design.

Learning-To-Rank regression

Proportionate Recursive Maximum Correntropy Criterion Adaptive Filtering Algorithms and their Performance Analysis

no code implementations22 Oct 2022 Zhen Qin, Jun Tao, Le Yang, Ming Jiang

Motivated by the success of our recently proposed proportionate recursive least squares (PRLS) algorithm for sparse system identification, we propose to introduce the proportionate updating (PU) mechanism into the RMCC, leading to two sparsity-aware RMCC algorithms: the proportionate recursive MCC (PRMCC) algorithm and the combinational PRMCC (CPRMCC) algorithm.

The Devil in Linear Transformer

1 code implementation19 Oct 2022 Zhen Qin, Xiaodong Han, Weixuan Sun, Dongxu Li, Lingpeng Kong, Nick Barnes, Yiran Zhong

In this paper, we examine existing kernel-based linear transformers and identify two key issues that lead to such performance gaps: 1) unbounded gradients in the attention computation adversely impact the convergence of linear transformer models; 2) attention dilution which trivially distributes attention scores over long sequences while neglecting neighbouring structures.

Language Modelling Text Classification

Linear Video Transformer with Feature Fixation

no code implementations15 Oct 2022 Kaiyue Lu, Zexiang Liu, Jianyuan Wang, Weixuan Sun, Zhen Qin, Dong Li, Xuyang Shen, Hui Deng, Xiaodong Han, Yuchao Dai, Yiran Zhong

Therefore, we propose a feature fixation module to reweight the feature importance of the query and key before computing linear attention.

Feature Importance Video Classification

RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses

no code implementations12 Oct 2022 Honglei Zhuang, Zhen Qin, Rolf Jagerman, Kai Hui, Ji Ma, Jing Lu, Jianmo Ni, Xuanhui Wang, Michael Bendersky

Recently, substantial progress has been made in text ranking based on pretrained language models such as BERT.

Decoder

A Validation Approach to Over-parameterized Matrix and Image Recovery

no code implementations21 Sep 2022 Lijun Ding, Zhen Qin, Liwei Jiang, Jinxin Zhou, Zhihui Zhu

In this paper, we study the problem of recovering a low-rank matrix from a number of noisy random linear measurements.

Image Restoration

Neural Architecture Search on Efficient Transformers and Beyond

no code implementations28 Jul 2022 Zexiang Liu, Dong Li, Kaiyue Lu, Zhen Qin, Weixuan Sun, Jiacheng Xu, Yiran Zhong

To address this issue, we propose a new framework to find optimal architectures for efficient Transformers with the neural architecture search (NAS) technique.

Computational Efficiency Image Classification +2

Error Analysis of Tensor-Train Cross Approximation

no code implementations9 Jul 2022 Zhen Qin, Alexander Lidiak, Zhexuan Gong, Gongguo Tang, Michael B. Wakin, Zhihui Zhu

Tensor train decomposition is widely used in machine learning and quantum physics due to its concise representation of high-dimensional tensors, overcoming the curse of dimensionality.

Vicinity Vision Transformer

1 code implementation21 Jun 2022 Weixuan Sun, Zhen Qin, Hui Deng, Jianyuan Wang, Yi Zhang, Kaihao Zhang, Nick Barnes, Stan Birchfield, Lingpeng Kong, Yiran Zhong

Based on this observation, we present a Vicinity Attention that introduces a locality bias to vision transformers with linear complexity.

Image Classification

cosFormer: Rethinking Softmax in Attention

3 code implementations ICLR 2022 Zhen Qin, Weixuan Sun, Hui Deng, Dongxu Li, Yunshen Wei, Baohong Lv, Junjie Yan, Lingpeng Kong, Yiran Zhong

As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length.

D4RL Language Modelling +1

Transformer Memory as a Differentiable Search Index

1 code implementation14 Feb 2022 Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler

In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model.

Information Retrieval Retrieval

Rank4Class: A Ranking Formulation for Multiclass Classification

no code implementations17 Dec 2021 Nan Wang, Zhen Qin, Le Yan, Honglei Zhuang, Xuanhui Wang, Michael Bendersky, Marc Najork

Multiclass classification (MCC) is a fundamental machine learning problem of classifying each instance into one of a predefined set of classes.

Classification Image Classification +4

Improving Neural Ranking via Lossless Knowledge Distillation

no code implementations30 Sep 2021 Zhen Qin, Le Yan, Yi Tay, Honglei Zhuang, Xuanhui Wang, Michael Bendersky, Marc Najork

We explore a novel perspective of knowledge distillation (KD) for learning to rank (LTR), and introduce Self-Distilled neural Rankers (SDR), where student rankers are parameterized identically to their teachers.

Knowledge Distillation Learning-To-Rank

Rank4Class: Examining Multiclass Classification through the Lens of Learning to Rank

no code implementations29 Sep 2021 Nan Wang, Zhen Qin, Le Yan, Honglei Zhuang, Xuanhui Wang, Michael Bendersky, Marc Najork

We further demonstrate that the most popular MCC architecture in deep learning can be mathematically formulated as a LTR pipeline equivalently, with a specific set of choices in terms of ranking model architecture and loss function.

Image Classification Information Retrieval +4

Are Pretrained Convolutions Better than Pretrained Transformers?

1 code implementation ACL 2021 Yi Tay, Mostafa Dehghani, Jai Prakash Gupta, Vamsi Aribandi, Dara Bahri, Zhen Qin, Donald Metzler

In the context of language models, are convolutional models competitive to Transformers when pre-trained?

Are Pre-trained Convolutions Better than Pre-trained Transformers?

1 code implementation7 May 2021 Yi Tay, Mostafa Dehghani, Jai Gupta, Dara Bahri, Vamsi Aribandi, Zhen Qin, Donald Metzler

In the context of language models, are convolutional models competitive to Transformers when pre-trained?

OmniNet: Omnidirectional Representations from Transformers

1 code implementation1 Mar 2021 Yi Tay, Mostafa Dehghani, Vamsi Aribandi, Jai Gupta, Philip Pham, Zhen Qin, Dara Bahri, Da-Cheng Juan, Donald Metzler

In OmniNet, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in the entire network.

de-en Few-Shot Learning +3

Neural Rankers are hitherto Outperformed by Gradient Boosted Decision Trees

no code implementations ICLR 2021 Zhen Qin, Le Yan, Honglei Zhuang, Yi Tay, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, Marc Najork

We first validate this concern by showing that most recent neural LTR models are, by a large margin, inferior to the best publicly available Gradient Boosted Decision Trees (GBDT) in terms of their reported ranking accuracy on benchmark datasets.

Learning-To-Rank

DeepKeyGen: A Deep Learning-based Stream Cipher Generator for Medical Image Encryption and Decryption

no code implementations21 Dec 2020 Yi Ding, Fuyuan Tan, Zhen Qin, Mingsheng Cao, Kim-Kwang Raymond Choo, Zhiguang Qin

In this paper, a novel deep learning-based key generation network (DeepKeyGen) is proposed as a stream cipher generator to generate the private key, which can then be used for encrypting and decrypting of medical images.

Generative Adversarial Network

Do RNN and LSTM have Long Memory?

1 code implementation ICML 2020 Jingyu Zhao, Feiqing Huang, Jia Lv, Yanjie Duan, Zhen Qin, Guodong Li, Guangjian Tian

The LSTM network was proposed to overcome the difficulty in learning long-term dependence, and has made significant advancements in applications.

Non-Clicks Mean Irrelevant? Propensity Ratio Scoring As a Correction

no code implementations18 May 2020 Nan Wang, Zhen Qin, Xuanhui Wang, Hongning Wang

Recent advances in unbiased learning to rank (LTR) count on Inverse Propensity Scoring (IPS) to eliminate bias in implicit feedback.

Learning-To-Rank

Multi-Task Learning for Email Search Ranking with Auxiliary Query Clustering

no code implementations15 Sep 2018 Jiaming Shen, Maryam Karimzadehgan, Michael Bendersky, Zhen Qin, Donald Metzler

In this paper, we study how to obtain query type in an unsupervised fashion and how to incorporate this information into query-dependent ranking models.

Clustering Multi-Task Learning +1

An Online Learned Elementary Grouping Model for Multi-target Tracking

no code implementations CVPR 2014 Xiaojing Chen, Zhen Qin, Le An, Bir Bhanu

We introduce an online approach to learn possible elementary groups (groups that contain only two targets) for inferring high level context that can be used to improve multi-target tracking in a data-association based framework.

Efficient Online Bootstrapping for Large Scale Learning

no code implementations18 Dec 2013 Zhen Qin, Vaclav Petricek, Nikos Karampatziakis, Lihong Li, John Langford

Bootstrapping is a useful technique for estimating the uncertainty of a predictor, for example, confidence intervals for prediction.

Cannot find the paper you are looking for? You can Submit a new open access paper.