Search Results for author: James O' Neill

Found 18 papers, 0 papers with code

Gradient Sparsification For Masked Fine-Tuning of Transformers

no code implementations19 Jul 2023 James O' Neill, Sourav Dutta

We introduce GradDrop and variants thereof, a class of gradient sparsification methods that mask gradients during the backward pass, acting as gradient noise.

Transfer Learning

Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models

no code implementations12 Jul 2023 James O' Neill, Sourav Dutta

We investigate the effects of post-training quantization and quantization-aware training on the generalization of Transformer language models.

Quantization XLM-R

Aligned Weight Regularizers for Pruning Pretrained Neural Networks

no code implementations Findings (ACL) 2022 James O' Neill, Sourav Dutta, Haytham Assem

While various avenues of research have been explored for iterative pruning, little is known what effect pruning has on zero-shot test performance and its potential implications on the choice of pruning criteria.

Language Modelling Model Compression

Deep Neural Compression Via Concurrent Pruning and Self-Distillation

no code implementations30 Sep 2021 James O' Neill, Sourav Dutta, Haytham Assem

Pruning aims to reduce the number of parameters while maintaining performance close to the original network.

Knowledge Distillation Language Modelling

Self-Distilled Pruning Of Neural Networks

no code implementations29 Sep 2021 James O' Neill, Sourav Dutta, Haytham Assem

Pruning aims to reduce the number of parameters while maintaining performance close to the original network.

Knowledge Distillation Language Modelling

Semantically-Conditioned Negative Samples for Efficient Contrastive Learning

no code implementations12 Feb 2021 James O' Neill, Danushka Bollegala

In the knowledge distillation setting, (1) the performance of student networks increase by 4. 56\% percentage points on Tiny-ImageNet-200 and 3. 29\% on CIFAR-100 over student networks trained with no teacher and (2) 1. 23\% and 1. 72\% respectively over a \textit{hard-to-beat} baseline (Hinton et al., 2015).

Contrastive Learning Knowledge Distillation

$k$-Neighbor Based Curriculum Sampling for Sequence Prediction

no code implementations22 Jan 2021 James O' Neill, Danushka Bollegala

At test time, a sequence predictor is required to make predictions given past predictions as the input, instead of the past targets that are provided during training.

Language Modelling

Compressing Deep Neural Networks via Layer Fusion

no code implementations29 Jul 2020 James O' Neill, Greg Ver Steeg, Aram Galstyan

This paper proposes \textit{layer fusion} - a model compression technique that discovers which weights to combine and then fuses weights of similar fully-connected, convolutional and attention layers.

Exponential degradation Language Modelling +1

An Overview of Neural Network Compression

no code implementations5 Jun 2020 James O' Neill

Thus, in recent years there has been a resurgence in model compression techniques, particularly for deep convolutional neural networks and self-attention based networks such as the Transformer.

Knowledge Distillation Neural Network Compression +2

Transfer Reward Learning for Policy Gradient-Based Text Generation

no code implementations9 Sep 2019 James O' Neill, Danushka Bollegala

However, we argue that current n-gram overlap based measures that are used as rewards can be improved by using model-based rewards transferred from tasks that directly compare the similarity of sentence pairs.

Conditional Text Generation Image Captioning +5

Learning To Avoid Negative Transfer in Few Shot Transfer Learning

no code implementations24 Mar 2019 James O' Neill

However, transferring all parameters, some of which irrelevant for a target task, can lead to sub-optimal results and can have a negative effect on performance, referred to as \textit{negative} transfer.

Few-Shot Learning Natural Language Inference +2

Error-Correcting Neural Sequence Prediction

no code implementations21 Jan 2019 James O' Neill, Danushka Bollegala

We propose a novel neural sequence prediction method based on \textit{error-correcting output codes} that avoids exact softmax normalization and allows for a tradeoff between speed and performance.

Image Captioning Language Modelling +1

Analysing Dropout and Compounding Errors in Neural Language Models

no code implementations2 Nov 2018 James O' Neill, Danushka Bollegala

Moreover, we propose an extension of variational dropout to concrete dropout and curriculum dropout with varying schedules.

Language Modelling

Curriculum-Based Neighborhood Sampling For Sequence Prediction

no code implementations16 Sep 2018 James O' Neill, Danushka Bollegala

At test time, a language model is required to make predictions given past predictions as input, instead of the past targets that are provided during training.

Language Modelling

Meta-Embedding as Auxiliary Task Regularization

no code implementations16 Sep 2018 James O' Neill, Danushka Bollegala

For intrinsic task evaluation, supervision comes from various labeled word similarity datasets.

Self-Supervised Learning Sentence +3

Angular-Based Word Meta-Embedding Learning

no code implementations13 Aug 2018 James O' Neill, Danushka Bollegala

This work compares meta-embeddings trained for different losses, namely loss functions that account for angular distance between the reconstructed embedding and the target and those that account normalized distances based on the vector length.

Meta-Learning Word Embeddings +1

Siamese Capsule Networks

no code implementations ICLR 2019 James O' Neill

Capsule Networks have shown encouraging results on \textit{defacto} benchmark computer vision datasets such as MNIST, CIFAR and smallNORB.

Face Verification Few-Shot Learning

Dropping Networks for Transfer Learning

no code implementations23 Apr 2018 James O' Neill, Danushka Bollegala

We also compare against models that are fully trained on the target task in the standard supervised learning setup.

Few-Shot Learning Natural Language Inference +2

Cannot find the paper you are looking for? You can Submit a new open access paper.