Search Results for author: Ibrahim Alabdulmohsin

Found 25 papers, 10 papers with code

CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?

no code implementations7 Mar 2024 Ibrahim Alabdulmohsin, Xiao Wang, Andreas Steiner, Priya Goyal, Alexander D'Amour, Xiaohua Zhai

Interestingly, data and architectural improvements seem to mitigate the negative impact of data balancing on performance; e. g. applying M4 to SigLIP-B/16 with data quality filters improves COCO image-to-text retrieval @5 from 86% (without data balancing) to 87% and ImageNet 0-shot classification from 77% to 77. 5%!

Image-to-Text Retrieval Retrieval +1

Fractal Patterns May Unravel the Intelligence in Next-Token Prediction

no code implementations2 Feb 2024 Ibrahim Alabdulmohsin, Vinh Q. Tran, Mostafa Dehghani

We study the fractal structure of language, aiming to provide a precise formalism for quantifying properties that may have been previously suspected but not formally shown.

Adapting to Latent Subgroup Shifts via Concepts and Proxies

no code implementations21 Dec 2022 Ibrahim Alabdulmohsin, Nicole Chiou, Alexander D'Amour, Arthur Gretton, Sanmi Koyejo, Matt J. Kusner, Stephen R. Pfohl, Olawale Salaudeen, Jessica Schrouff, Katherine Tsai

We show that the optimal target predictor can be non-parametrically identified with the help of concept and proxy variables available only in the source domain, and unlabeled data from the target.

Unsupervised Domain Adaptation

Layer-Stack Temperature Scaling

no code implementations18 Nov 2022 Amr Khalifa, Michael C. Mozer, Hanie Sedghi, Behnam Neyshabur, Ibrahim Alabdulmohsin

Inspired by this, we show that extending temperature scaling across all layers improves both calibration and accuracy.

Revisiting Neural Scaling Laws in Language and Vision

1 code implementation13 Sep 2022 Ibrahim Alabdulmohsin, Behnam Neyshabur, Xiaohua Zhai

The remarkable progress in deep learning in recent years is largely driven by improvements in scale, where bigger models are trained on larger datasets for longer schedules.

Image Classification Language Modelling +3

A Reduction to Binary Approach for Debiasing Multiclass Datasets

1 code implementation31 May 2022 Ibrahim Alabdulmohsin, Jessica Schrouff, Oluwasanmi Koyejo

We propose a novel reduction-to-binary (R2B) approach that enforces demographic parity for multiclass classification with non-binary sensitive attributes via a reduction to a sequence of binary debiasing tasks.

Fair Wrapping for Black-box Predictions

1 code implementation31 Jan 2022 Alexander Soen, Ibrahim Alabdulmohsin, Sanmi Koyejo, Yishay Mansour, Nyalleng Moorosi, Richard Nock, Ke Sun, Lexing Xie

We introduce a new family of techniques to post-process ("wrap") a black-box classifier in order to reduce its bias.

Fairness

Improving the Post-hoc Calibration of Modern Neural Networks with Probe Scaling

no code implementations29 Sep 2021 Amr Khalifa, Ibrahim Alabdulmohsin

We present "probe scaling": a post-hoc recipe for calibrating the predictions of modern neural networks.

The Impact of Reinitialization on Generalization in Convolutional Neural Networks

no code implementations1 Sep 2021 Ibrahim Alabdulmohsin, Hartmut Maennel, Daniel Keysers

Recent results suggest that reinitializing a subset of the parameters of a neural network during training can improve generalization, particularly for small training sets.

Generalization Bounds Image Classification +1

A Generalized Lottery Ticket Hypothesis

no code implementations3 Jul 2021 Ibrahim Alabdulmohsin, Larisa Markeeva, Daniel Keysers, Ilya Tolstikhin

We introduce a generalization to the lottery ticket hypothesis in which the notion of "sparsity" is relaxed by choosing an arbitrary basis in the space of parameters.

A Near-Optimal Algorithm for Debiasing Trained Machine Learning Models

1 code implementation NeurIPS 2021 Ibrahim Alabdulmohsin, Mario Lucic

We present a scalable post-processing algorithm for debiasing trained models, including deep neural networks (DNNs), which we prove to be near-optimal by bounding its excess Bayes risk.

BIG-bench Machine Learning

A Near-Optimal Recipe for Debiasing Trained Machine Learning Models

no code implementations1 Jan 2021 Ibrahim Alabdulmohsin, Mario Lucic

We present an efficient and scalable algorithm for debiasing trained models, including deep neural networks (DNNs), which we prove to be near-optimal by bounding its excess Bayes risk.

BIG-bench Machine Learning Classification +1

What Do Neural Networks Learn When Trained With Random Labels?

no code implementations NeurIPS 2020 Hartmut Maennel, Ibrahim Alabdulmohsin, Ilya Tolstikhin, Robert J. N. Baldock, Olivier Bousquet, Sylvain Gelly, Daniel Keysers

We show how this alignment produces a positive transfer: networks pre-trained with random labels train faster downstream compared to training from scratch even after accounting for simple effects, such as weight scaling.

Memorization

Fair Classification via Unconstrained Optimization

no code implementations21 May 2020 Ibrahim Alabdulmohsin

In addition, it can accommodate many fairness criteria that have been previously proposed in the literature, such as equalized odds and statistical parity.

Binary Classification Classification +2

Information Theoretic Guarantees for Empirical Risk Minimization with Applications to Model Selection and Large-Scale Optimization

no code implementations ICML 2018 Ibrahim Alabdulmohsin

In this paper, we derive bounds on the mutual information of the empirical risk minimization (ERM) procedure for both 0-1 and strongly-convex loss classes.

Learning Theory Model Selection

Uniform Generalization, Concentration, and Adaptive Learning

no code implementations22 Aug 2016 Ibrahim Alabdulmohsin

Mathematically, this requires that the learning algorithm enjoys a small generalization risk, which is defined either in expectation or in probability.

Learning Theory

A Mathematical Theory of Learning

no code implementations7 May 2014 Ibrahim Alabdulmohsin

Depending on the hypothesis space and how the final hypothesis is selected, we show that a learning process can be assigned a numeric score, called learning capacity, which is analogous to Shannon's channel capacity and satisfies similar interesting properties as well such as the data-processing inequality and the information-cannot-hurt inequality.

Clustering Learning Theory

Cannot find the paper you are looking for? You can Submit a new open access paper.