1 code implementation • 30 Jan 2025 • Adarsh Kappiyath, Abhra Chaudhuri, Ajay Jaiswal, Ziquan Liu, Yunpeng Li, Xiatian Zhu, Lu Yin
Ranking samples by fine-grained estimates of spuriosity (the degree to which spurious cues are present) has recently been shown to significantly benefit bias mitigation, over the traditional binary biased-\textit{vs}-unbiased partitioning of train sets.
1 code implementation • 18 Dec 2024 • Pengxiang Li, Lu Yin, Shiwei Liu
In contrast, Post-Layer Normalization (Post-LN) preserves larger gradient norms in deeper layers but suffers from vanishing gradients in earlier layers.
no code implementations • 17 Dec 2024 • Tim van Engeland, Lu Yin, Vlado Menkovski
This label serves as a basis for the comparison between the query object and the objects in the support set.
1 code implementation • 26 Nov 2024 • Mingyu Cao, Gen Li, Jie Ji, JiaQi Zhang, Xiaolong Ma, Shiwei Liu, Lu Yin
Mixture-of-Experts (MOE) has garnered significant attention for their ability to scale up neural networks while utilizing the same or even fewer active parameters.
no code implementations • 20 Nov 2024 • Andy Li, Aiden Durrant, Milan Markovic, Lu Yin, Georgios Leontidis
Pruning of deep neural networks has been an effective technique for reducing model size while preserving most of the performance of dense networks, crucial for deploying models on memory and power-constrained devices.
1 code implementation • 12 Nov 2024 • Jiaxi Li, Lu Yin, Xilu Wang
The integration of Large Language Models (LLMs) into autonomous driving systems offers promising enhancements in environmental understanding and decision-making.
1 code implementation • 9 Nov 2024 • Xinglei Wang, Tao Cheng, Stephen Law, Zichao Zeng, Lu Yin, Junyuan Liu
Existing methods for learning urban space representations from Point-of-Interest (POI) data face several limitations, including issues with geographical delineation, inadequate spatial information modelling, underutilisation of POI semantic attributes, and computational inefficiencies.
1 code implementation • 2 Nov 2024 • Yuxiang Guo, Lu Yin, Bo Jiang, JiaQi Zhang
Standard alignment techniques, such as Direct Preference Optimization (DPO), often rely on the binary Bradley-Terry (BT) model, which can struggle to capture the complexities of human preferences -- particularly in the presence of noisy or inconsistent labels and frequent ties.
no code implementations • 10 Oct 2024 • Adriana Fernandez-Lopez, Shiwei Liu, Lu Yin, Stavros Petridis, Maja Pantic
This paper investigates the under-explored area of low-rank weight training for large-scale Conformer-based speech recognition models from scratch.
1 code implementation • 9 Oct 2024 • Abhinav Bandari, Lu Yin, Cheng-Yu Hsieh, Ajay Kumar Jaiswal, Tianlong Chen, Li Shen, Ranjay Krishna, Shiwei Liu
In this study, we evaluate the choice of calibration data on LLM pruning, across a wide range of datasets that are most commonly used in LLM training and evaluation, including four pertaining datasets as well as three categories of downstream tasks encompassing nine datasets.
no code implementations • 13 Sep 2024 • Qiao Xiao, Boqian Wu, Lu Yin, Christopher Neil Gadzinski, Tianjin Huang, Mykola Pechenizkiy, Decebal Constantin Mocanu
These hard samples play a crucial role in the optimal performance of deep neural networks.
no code implementations • 14 Aug 2024 • Ricky Maulana Fajri, Yulong Pei, Lu Yin, Mykola Pechenizkiy
Despite significant advancements in active learning and adversarial attacks, the intersection of these two fields remains underexplored, particularly in developing robust active learning frameworks against dynamic adversarial threats.
1 code implementation • 15 Jul 2024 • Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu, Jiawei Zhao, Yuandong Tian, Zhangyang Wang
Modern Large Language Models (LLMs) are composed of matrices with billions of elements, making their storage and processing quite demanding in terms of computational resources and memory usage.
2 code implementations • 11 Jul 2024 • Zhenyu Zhang, Ajay Jaiswal, Lu Yin, Shiwei Liu, Jiawei Zhao, Yuandong Tian, Zhangyang Wang
To address these limitations, we introduce Q-Galore, a novel approach that substantially reduces memory usage by combining quantization and low-rank projection, surpassing the benefits of GaLore.
no code implementations • 26 Jun 2024 • Qiao Xiao, Pingchuan Ma, Adriana Fernandez-Lopez, Boqian Wu, Lu Yin, Stavros Petridis, Mykola Pechenizkiy, Maja Pantic, Decebal Constantin Mocanu, Shiwei Liu
The recent success of Automatic Speech Recognition (ASR) is largely attributed to the ever-growing amount of training data.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 25 Jun 2024 • Adriana Fernandez-Lopez, Honglie Chen, Pingchuan Ma, Lu Yin, Qiao Xiao, Stavros Petridis, Shiwei Liu, Maja Pantic
In this study, we propose a regularization technique that facilitates the training of visual and audio-visual speech recognition models (VSR and AVSR) from scratch.
1 code implementation • 28 May 2024 • Pengxiang Li, Lu Yin, Xiaowei Gao, Shiwei Liu
The rapid advancements in Large Language Models (LLMs) have revolutionized various natural language processing tasks.
no code implementations • 8 May 2024 • Zheyan Qu, Lu Yin, Zitong Yu, Wenbo Wang, Xing Zhang
Moreover, considering the alignment of LLM responses with user needs, a novel method for discrete prompt optimization based on LLM-as-Judge is introduced.
no code implementations • 5 Apr 2024 • Ajay Jaiswal, Bodun Hu, Lu Yin, Yeonju Ro, Shiwei Liu, Tianlong Chen, Aditya Akella
In this work, we observed the saturation of computationally expensive feed-forward blocks of LLM layers and proposed FFN-SkipLLM, which is a novel fine-grained skip strategy of autoregressive LLMs.
1 code implementation • 7 Dec 2023 • Ricky Maulana Fajri, Yulong Pei, Lu Yin, Mykola Pechenizkiy
To address this problem, we propose the Structural-Clustering PageRank method for improved Active learning (SPA) specifically designed for graph-structured data.
1 code implementation • 7 Dec 2023 • Boqian Wu, Qiao Xiao, Shiwei Liu, Lu Yin, Mykola Pechenizkiy, Decebal Constantin Mocanu, Maurice van Keulen, Elena Mocanu
E2ENet achieves comparable accuracy on the large-scale challenge AMOS-CT, while saving over 68\% parameter count and 29\% FLOPs in the inference phase, compared with the previous best-performing method.
1 code implementation • 5 Dec 2023 • Jiaxu Zhao, Lu Yin, Shiwei Liu, Meng Fang, Mykola Pechenizkiy
These bias attributes are strongly spuriously correlated with the target variable, causing the models to be biased towards spurious correlations (i. e., \textit{bias-conflicting}).
1 code implementation • 8 Oct 2023 • Lu Yin, You Wu, Zhenyu Zhang, Cheng-Yu Hsieh, Yaqing Wang, Yiling Jia, Gen Li, Ajay Jaiswal, Mykola Pechenizkiy, Yi Liang, Michael Bendersky, Zhangyang Wang, Shiwei Liu
Large Language Models (LLMs), renowned for their remarkable performance across diverse domains, present a challenge when it comes to practical deployment due to their colossal model size.
1 code implementation • 29 Sep 2023 • Lu Yin, Ajay Jaiswal, Shiwei Liu, Souvik Kundu, Zhangyang Wang
Contrary to this belief, this paper presents a counter-argument: small-magnitude weights of pre-trained model weights encode vital knowledge essential for tackling difficult downstream tasks - manifested as the monotonic relationship between the performance drop of downstream tasks across the difficulty spectrum, as we prune more pre-trained weights by magnitude.
1 code implementation • 25 Jun 2023 • Tianjin Huang, Shiwei Liu, Tianlong Chen, Meng Fang, Li Shen, Vlaod Menkovski, Lu Yin, Yulong Pei, Mykola Pechenizkiy
Despite the fact that adversarial training has become the de facto method for improving the robustness of deep neural networks, it is well-known that vanilla adversarial training suffers from daunting robust overfitting, resulting in unsatisfactory robust generalization.
1 code implementation • 30 May 2023 • Tianjin Huang, Lu Yin, Zhenyu Zhang, Li Shen, Meng Fang, Mykola Pechenizkiy, Zhangyang Wang, Shiwei Liu
We hereby carry out a first-of-its-kind study unveiling that modern large-kernel ConvNets, a compelling competitor to Vision Transformers, are remarkably more effective teachers for small-kernel ConvNets, due to more similar architectures.
1 code implementation • 10 Mar 2023 • Zahra Atashgahi, Xuhao Zhang, Neil Kichler, Shiwei Liu, Lu Yin, Mykola Pechenizkiy, Raymond Veldhuis, Decebal Constantin Mocanu
Feature selection that selects an informative subset of variables from data not only enhances the model interpretability and performance but also alleviates the resource demands.
1 code implementation • 28 Nov 2022 • Tianjin Huang, Tianlong Chen, Meng Fang, Vlado Menkovski, Jiaxu Zhao, Lu Yin, Yulong Pei, Decebal Constantin Mocanu, Zhangyang Wang, Mykola Pechenizkiy, Shiwei Liu
Recent works have impressively demonstrated that there exists a subnetwork in randomly initialized convolutional neural networks (CNNs) that can match the performance of the fully trained dense networks at initialization, without any optimization of the weights of the network (i. e., untrained networks).
1 code implementation • 23 Aug 2022 • Lu Yin, Shiwei Liu, Meng Fang, Tianjin Huang, Vlado Menkovski, Mykola Pechenizkiy
We call our method Lottery Pools.
no code implementations • 30 May 2022 • Lu Yin, Vlado Menkovski, Meng Fang, Tianjin Huang, Yulong Pei, Mykola Pechenizkiy, Decebal Constantin Mocanu, Shiwei Liu
Recent works on sparse neural network training (sparse training) have shown that a compelling trade-off between performance and efficiency can be achieved by training intrinsically sparse neural networks from scratch.
no code implementations • 16 Dec 2021 • Lu Yin, Vlado Menkovski, Yulong Pei, Mykola Pechenizkiy
In this work, we advance the few-shot learning towards this more challenging scenario, the semantic-based few-shot learning, and propose a method to address the paradigm by capturing the inner semantic relationships using interactive psychometric learning.
no code implementations • 7 Jul 2021 • Lu Yin, Vlado Menkovski, Shiwei Liu, Mykola Pechenizkiy
One of the major challenges in the supervised learning approaches is expressing and collecting the rich knowledge that experts have with respect to the meaning present in the image data.
2 code implementations • NeurIPS 2021 • Shiwei Liu, Tianlong Chen, Xiaohan Chen, Zahra Atashgahi, Lu Yin, Huanyu Kou, Li Shen, Mykola Pechenizkiy, Zhangyang Wang, Decebal Constantin Mocanu
Works on lottery ticket hypothesis (LTH) and single-shot network pruning (SNIP) have raised a lot of attention currently on post-training pruning (iterative magnitude pruning), and before-training pruning (pruning at initialization).
Ranked #3 on
Sparse Learning
on ImageNet
1 code implementation • 28 May 2021 • Yongji Wu, Defu Lian, Neil Zhenqiang Gong, Lu Yin, Mingyang Yin, Jingren Zhou, Hongxia Yang
Inspired by the idea of vector quantization that uses cluster centroids to approximate items, we propose LISA (LInear-time Self Attention), which enjoys both the effectiveness of vanilla self-attention and the efficiency of sparse attention.
no code implementations • 28 May 2021 • Yongji Wu, Lu Yin, Defu Lian, Mingyang Yin, Neil Zhenqiang Gong, Jingren Zhou, Hongxia Yang
With the rapid development of these services in the last two decades, users have accumulated a massive amount of behavior data.
4 code implementations • 4 Feb 2021 • Shiwei Liu, Lu Yin, Decebal Constantin Mocanu, Mykola Pechenizkiy
By starting from a random sparse network and continuously exploring sparse connectivities during training, we can perform an Over-Parameterization in the space-time manifold, closing the gap in the expressibility between sparse training and dense training.
Ranked #4 on
Sparse Learning
on ImageNet
no code implementations • 14 Apr 2020 • Lu Yin, Vlado Menkovski, Mykola Pechenizkiy
The main reason for such a reductionist approach is the difficulty in eliciting the domain knowledge from the experts.
no code implementations • 10 Mar 2020 • Chenjie Wang, Bin Luo, Yun Zhang, Qing Zhao, Lu Yin, Wei Wang, Xin Su, Yajun Wang, Chengyuan Li
The only input of DymSLAM is stereo video, and its output includes a dense map of the static environment, 3D model of the moving objects and the trajectories of the camera and the moving objects.