In this work we examine how fine-tuning impacts the fairness of contrastive Self-Supervised Learning (SSL) models.
While state-of-the-art contrastive Self-Supervised Learning (SSL) models produce results competitive with their supervised counterparts, they lack the ability to infer latent variables.
Despite the success of a number of recent techniques for visual self-supervised deep learning, there has been limited investigation into the representations that are ultimately learned.
To create our dataset, we leverage a large repository of synthetic scenes created by professional artists, and we generate 77, 400 images of 461 indoor scenes with detailed per-pixel labels and corresponding ground truth geometry.
In particular, we explore how best to combine the modalities, such that fine-grained representations of the visual and audio modalities can be maintained, whilst also integrating text into a common embedding.
Modern neural network training relies on piece-wise (sub-)differentiable functions in order to use backpropagation to update model parameters.
Image classification with deep neural networks is typically restricted to images of small dimensionality such as 224 x 244 in Resnet models .
Continual learning is the ability to sequentially learn over time by accommodating knowledge while retaining previously learned experiences.
We propose a new method for input variable selection in nonlinear regression.
Lifelong learning is the problem of learning multiple consecutive tasks in a sequential manner, where knowledge gained from previous tasks is retained and used to aid future learning over the lifetime of the learner.