In this aspect the proposed SSL frame-work MC-SSL0. 0 is a step towards Multi-Concept Self-Supervised Learning (MC-SSL) that goes beyond modelling single dominant label in an image to effectively utilise the information from all the concepts present in it.
Hence, most of the learning is independent of the image patches $(N)$ in the higher layers, and the class embedding is learned solely based on the Super tokens $(N/M^2)$ where $M^2$ is the window size.
First, we theoretically show the transferability of robustness from an adversarially trained teacher model to a student model with the help of mixup augmentation.
Extensive Unsupervised Domain Adaptation (UDA) studies have shown great success in practice by learning transferable representations across a labeled source domain and an unlabeled target domain with deep models.
We also observed that SiT is good for few shot learning and also showed that it is learning useful representation by simply training a linear classifier on top of the learned features from SiT.
One key ingredient of DCNN-based FR is the appropriate design of a loss function that ensures discrimination between various identities.
In this work, we present AXM-Net, a novel CNN based architecture designed for learning semantically aligned visual and textual representations.
Ranked #3 on Text based Person Retrieval on CUHK-PEDES
Compared with existing loss functions, the lower gradient of the proposed loss function leads to the convergence of SGD to a better optimum point, and consequently a better generalisation.
Deep neural networks have enhanced the performance of decision making systems in many applications including image understanding, and further gains can be achieved by constructing ensembles.
However, in real-world surveillance scenarios, frequently no visual information will be available about the queried person.
Among the plethora of techniques devised to curb the prevalence of noise in medical images, deep learning based approaches have shown the most promise.
We present a new loss function, namely Wing loss, for robust facial landmark localisation with Convolutional Neural Networks (CNNs).
Ranked #1 on Face Alignment on 300W (NME_inter-pupil (%, Common) metric)
Deep learning is successfully used as a tool for machine learning, where a neural network is capable of automatically learning features.
In this paper, we show how a 3D Morphable Model (i. e. a statistical model of the 3D shape of a class of objects such as faces) can be used to spatially transform input data as a module (a 3DMM-STN) within a convolutional neural network.
The framework has four stages: face detection, bounding box aggregation, pose estimation and landmark localisation.
The learned features and the classification results are used to retrieve medical images.