In this work, we address domain generalization with MixStyle, a plug-and-play, parameter-free module that is simply inserted to shallow CNN layers and requires no modification to training objectives.
Gradient-based meta-learning and hyperparameter optimization have seen significant progress recently, enabling practical end-to-end training of neural networks together with many hyperparameters.
The breakthrough of contrastive learning (CL) has fueled the recent success of self-supervised learning (SSL) in high-level vision tasks on RGB images.
Calibration of neural networks is a topical problem that is becoming increasingly important for real-world use of neural networks.
Current supervised sketch-based image retrieval (SBIR) methods achieve excellent performance.
The training of deep neural networks (DNNs) is usually memory-hungry due to the limited device memory capacity of DNN accelerators.
With this meta-learning framework, our model can not only disentangle the cross-modal shared semantic content for SBIR, but can adapt the disentanglement to any unseen user style as well, making the SBIR model truly style-agnostic.
Analysis of human sketches in deep learning has advanced immensely through the use of waypoint-sequences rather than raster-graphic representations.
This data is uniquely characterised by its existence in dual modalities of rasterized images and vector coordinate sequences.
A fundamental challenge faced by existing Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) models is the data scarcity -- model performances are largely bottlenecked by the lack of sketch-photo pairs.
We argue that these are caused by the lack of context-aware object and stuff feature encoding in their generators, and location-sensitive appearance representation in their discriminators.
Ranked #1 on Layout-to-Image Generation on COCO-Stuff 128x128
However Visual Relationship Prediction (VRP) also provides a more challenging test of image understanding than conventional image tagging, and is difficult to learn due to a large label-space and incomplete annotation.
In this paper, we study a further trait of sketches that has been overlooked to date, that is, they are hierarchical in terms of the levels of detail -- a person typically sketches up to various extents of detail to depict an object.
The study of neural generative models of human sketches is a fascinating contemporary modeling problem due to the links between sketch image generation and the human drawing process.
In particular, we study the problem of label distillation - creating synthetic labels for a small set of real images, and show it to be more effective than the prior image-based approach to dataset distillation.
While theoretically appealing, the application of the Wasserstein distance to large-scale machine learning problems has been hampered by its prohibitive computational cost.
In DG this means encountering a sequence of domains and at each step training to maximise performance on the next domain.
This is achieved by having a learning objective formulated to ensure that the generated data can be correctly classified by the label classifier while fooling the domain classifier.
Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a variety of continuous control tasks.
In this paper, we propose Differentiable Automatic Data Augmentation (DADA) which dramatically reduces the cost.
Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of retrieving a particular photo instance given a user's query sketch.
Compared with the benchmarked models, our model has the lowest tracking error, across a range of portfolio sizes.
We address the problem of simultaneously learning a k-means clustering and deep feature representation from unlabelled data, which is of interest due to the potential of deep k-means to outperform traditional two-step feature extraction and shallow-clustering strategies.
An effective person re-identification (re-ID) model should learn feature representations that are both discriminative, for distinguishing similar-looking people, and generalisable, for deployment across datasets without any adaptation.
In the former one asks whether a machine can `understand' enough about the meaning of input data to produce a meaningful but more compact abstraction.
As an instance-level recognition problem, person re-identification (ReID) relies on discriminative features, which not only capture different spatial scales but also encapsulate an arbitrary combination of multiple scales.
Ranked #7 on Person Re-Identification on CUHK03
This is one of the very first studies which discuss a methodological framework that incorporates prior financial domain knowledge into neural network architecture design and model training.
The well known domain shift issue causes model performance to degrade when deployed to a new target domain with different statistics to training.
In this paper, we build on this strong baseline by designing an episodic training procedure that trains a single deep network in a way that exposes it to the domain shift that characterises a novel domain at runtime.
In this paper, a unified approach is presented to transfer learning that addresses several source and target domain label-space and annotation assumptions with a single model.
Ranked #17 on Unsupervised Domain Adaptation on Market to Duke
We thus propose a new deep comparison network comprised of embedding and relation modules that learn multiple non-linear distance metrics based on different levels of features simultaneously.
To advance subtle expression recognition, we contribute a Large-scale Subtle Emotions and Mental States in the Wild database (LSEMSW).
Human free-hand sketches have been studied in various contexts including sketch recognition, synthesis and fine-grained sketch-based image retrieval (FG-SBIR).
Once trained, a RN is able to classify images of new classes by computing relation scores between query images and the few examples of each new class without further updating the network.
In this paper, we make two main contributions: Firstly, we build upon the favorable domain shift-robust properties of deep learning methods, and develop a low-rank parameterized CNN model for end-to-end DG learning.
To solve this problem, we establish a theoretical equivalence between tensor optimisation and a two-stream gated neural network.
We propose to model complex visual scenes using a non-parametric Bayesian model learned from weakly labelled images abundant on media sharing sites such as Flickr.
Generating natural language descriptions of images is an important capability for a robot or other visual-intelligence driven AI agent that may need to communicate with human users about what it is seeing.
We propose a novel and flexible approach to meta-learning for learning-to-learn from only a few examples.
In this chapter, we propose a single framework that unifies multi-domain learning (MDL) and the related but better studied area of multi-task learning (MTL).
We propose a neural network approach to price EU call options that significantly outperforms some existing pricing models and comes with guarantees that its predictions are economically reasonable.
This allows a recognition model to be pre-calibrated for a new domain in advance (e. g., future time or view angle) without waiting for data collection and re-training.
To train such networks, very large training sets are needed with millions of labeled images.
Most visual recognition methods implicitly assume the data distribution remains unchanged from training to testing.
In this work, we conduct an extensive evaluation of CNN-based face recognition systems (CNN-FRS) on a common ground to make our work easily reproducible.
When humans describe images they tend to use combinations of nouns and adjectives, corresponding to objects and their associated attributes respectively.
Recently, zero-shot learning (ZSL) has received increasing interest.
Zero-shot learning has received increasing interest as a means to alleviate the often prohibitive expense of annotating training data for large scale recognition problems.
We propose a multi-scale multi-channel deep neural network framework that, for the first time, yields sketch recognition performance surpassing that of humans.
In this paper, we provide a new neural-network based perspective on multi-task learning (MTL) and multi-domain learning (MDL).