In this work, we propose a novel two-stage framework for improving domain adaptation techniques.
Ranked #3 on Unsupervised Domain Adaptation on SYNTHIA-to-Cityscapes (using extra training data)
The standard benchmark for this setting, KITTI-10m, has essentially been saturated by recent algorithms: many of them achieve near-perfect recall.
We study the 2-D super-resolution multi-reference alignment (SR-MRA) problem: estimating an image from its down-sampled, circularly-translated, and noisy copies.
Generative Adversarial Networks (GANs) are very popular frameworks for generating high-quality data, and are immensely used in both the academia and industry in many domains.
Neural implicit fields are quickly emerging as an attractive representation for learning based techniques.
Fictional languages have become increasingly popular over the recent years appearing in novels, movies, TV shows, comics, and video games.
In this work, we opt to learn the weighting function, by training a neural network on the control points from a single input shape, and exploit the innate smoothness of neural networks.
Video reconstruction from a single motion-blurred image is a challenging problem, which can enhance existing cameras' capabilities.
1 code implementation • • Sivan Harary, Eli Schwartz, Assaf Arbelle, Peter Staar, Shady Abu-Hussein, Elad Amrani, Roei Herzig, Amit Alfassy, Raja Giryes, Hilde Kuehne, Dina Katabi, Kate Saenko, Rogerio Feris, Leonid Karlinsky
The ability to generalize learned representations across significantly different visual domains, such as between real photos, clipart, paintings, and sketches, is a fundamental capacity of the human visual system.
The method drapes the source mesh over the target geometry and at the same time seeks to preserve the carefully designed characteristics of the source mesh.
A leading defense against such attacks is adversarial training, a technique in which a DNN is trained to be robust to adversarial attacks by introducing adversarial noise to its input.
MISS GAN is both input image specific and uses the information of other images using only one trained model.
We compare our model to state-of-the-art methods that are not ep-free and show that in the absence of camera parameters, we outperform them by a large margin while obtaining comparable results when camera parameters are available.
Ranked #10 on 3D Human Pose Estimation on Human3.6M
no code implementations • • Assaf Arbelle, Sivan Doveh, Amit Alfassy, Joseph Shtok, Guy Lev, Eli Schwartz, Hilde Kuehne, Hila Barak Levi, Prasanna Sattigeri, Rameswar Panda, Chun-Fu Chen, Alex Bronstein, Kate Saenko, Shimon Ullman, Raja Giryes, Rogerio Feris, Leonid Karlinsky
In this work, we focus on the task of Detector-Free WSG (DF-WSG) to solve WSG without relying on a pre-trained detector.
Multilayer-perceptrons (MLP) are known to struggle with learning functions of high-frequencies, and in particular cases with wide frequency bands.
In the first one, where no explicit prior is used, we show that the proposed approach outperforms other internal learning methods, such as DIP.
We then train a model that is applied directly to the RAW data by using knowledge distillation such that the model predictions for RAW images will be aligned with the predictions of an off-the-shelf pre-trained model for processed RGB images.
We model the output of the encoder latent space via a GMM, which leads to both good clustering using this latent space and improved image generation by the GAN.
A very practical example of C2FS is when the target classes are sub-classes of the training classes.
Neural Architecture Search (NAS) has been used recently to achieve improved performance in various tasks and most prominently in image classification.
Learning and synthesizing on local geometric patches enables a genus-oblivious framework, facilitating texture transfer between shapes of different genus.
Recently, several works have considered a back-projection (BP) based fidelity term as an alternative to the common least squares (LS), and demonstrated excellent results for popular inverse problems.
This work suggests using sampling theory to analyze the function space represented by neural networks.
We present PointGMM, a neural network that learns to generate hGMMs which are characteristic of the shape class, and also coincide with the input point cloud.
1 code implementation • 15 Mar 2020 • Leonid Karlinsky, Joseph Shtok, Amit Alfassy, Moshe Lichtenstein, Sivan Harary, Eli Schwartz, Sivan Doveh, Prasanna Sattigeri, Rogerio Feris, Alexander Bronstein, Raja Giryes
Few-shot detection and classification have advanced significantly in recent years.
The field of Few-Shot Learning (FSL), or learning from very few (typically $1$ or $5$) examples per novel class (unseen during training), has received a lot of attention and significant performance advances in the recent literature.
An autoencoder is a specific type of a neural network, which is mainly designed to encode the input into a compressed and meaningful representation, and then decode it back such that the reconstructed input is similar as possible to the original one.
Deep neural networks are a very powerful tool for many computer vision tasks, including image restoration, exhibiting state-of-the-art results.
We treat the problem of color enhancement as an image translation task, which we tackle using both supervised and unsupervised learning.
In this work, we propose an alternative strategy for GAN search by using a method called DEGAS (Differentiable Efficient GenerAtor Search), which focuses on efficiently finding the generator in the GAN.
Ranked #13 on Image Generation on STL-10
In this work, we propose to employ tools inspired by the Differentiable Neural Architecture Search (D-NAS) literature in order to optimize the architecture for FSL without over-fitting.
For a known kernel, we design a closed-form correction filter that modifies the low-resolution image to match one which is obtained by another kernel (e. g. bicubic), and thus improves the results of existing pre-trained DNNs.
This term encourages agreement between the projection of the optimization variable onto the row space of the linear operator and the pseudo-inverse of the linear operator ("back-projection") applied on the observations.
In the recent years, there has been a significant improvement in the quality of samples produced by (deep) generative models such as variational auto-encoders and generative adversarial networks.
Learning from one or few visual examples is one of the key capabilities of humans since early infancy, but is still a significant challenge for modern AI systems.
Ranked #8 on Few-Shot Image Classification on Mini-ImageNet - 1-Shot Learning (using extra training data)
In this paper, we propose a differentiable search space that allows the annealing of architecture weights, while gradually pruning inferior operations.
Taco-VC is implemented using a single speaker Tacotron synthesizer based on Phonetic PosteriorGrams (PPGs) and a single speaker WaveNet vocoder conditioned on mel spectrograms.
Many images shared over the web include overlaid objects, or visual motifs, such as text, symbols or drawings, which add a description or decoration to the image.
We focus on the task of semi-supervised transfer learning, in which unlabeled samples from the target dataset are available during the network training on the source dataset.
We conduct numerous experiments showing promising results for the label-set manipulation capabilities of the proposed approach, both directly (using the classification and retrieval metrics), and in the context of performing data augmentation for multi-label few-shot learning.
This has the disadvantage that the reconstructed image no longer obeys the sparsity prior used in the processing.
After this preliminary training, and after transforming the last layer of the network with new ones, we have designed an automatic classifier for the correct cell type (healthy/primary cancer/metastatic cancer) with 90-99% accuracy, although small training sets of down to several images have been used.
While deep neural networks exhibit state-of-the-art results in the task of image super-resolution (SR) with a fixed known acquisition process (e. g., a bicubic downscaling kernel), they experience a huge performance loss when the real observation model mismatches the one used in training.
While most works use uniform quantizers for both parameters and activations, it is not always the optimal one, and a non-uniform quantizer need to be considered.
Convolutional Neural Networks (CNN) are very popular in many fields including computer vision, speech recognition, natural language processing, to name a few.
In this paper, we utilize the unique properties of the mesh for a direct analysis of 3D shapes using MeshCNN, a convolutional neural network designed specifically for triangular meshes.
Deep learning models have lately shown great performance in various fields such as computer vision, speech recognition, speech translation, and natural language processing.
Our approach is based on a modified auto-encoder, denoted Delta-encoder, that learns to synthesize new samples for an unseen category just by seeing few examples from it.
Distance metric learning (DML) has been successfully applied to object classification, both in the standard regime of rich training data and in the few-shot scenario, where each category is represented by only a few examples.
Moreover, the behavior of DNNs compared to the linear classifiers SVM and LR is quite the same on the training and test data regardless of whether the network generalizes.
We present a novel method for neural network quantization that emulates a non-uniform $k$-quantile quantizer, which adapts to the distribution of the quantized parameters.
We investigate the classification performance of K-nearest neighbors (K-NN) and deep neural networks (DNNs) in the presence of label noise.
Recently there has been a dramatic increase in the performance of recognition systems due to the introduction of deep architectures for representation learning and classification.
In this work, we propose an alternative method for solving inverse problems using off-the-shelf denoisers, which requires less parameter tuning.
We further show that a significant boost in performance of up to $0. 4$ dB PSNR can be achieved by making our network class-aware, namely, by fine-tuning it for images belonging to a specific semantic class.
We show that whereas the generalization error of a non-invariant classifier is proportional to the complexity of the input space, the generalization error of an invariant classifier is proportional to the complexity of the base space.
In this work we suggest a novel method for coupling Gaussian denoising algorithms to Poisson noisy inverse problems, which is based on a general approach termed "Plug-and-Play".
Three important properties of a classification machinery are: (i) the system preserves the core information of the input data; (ii) the training examples convey information about unseen data; and (iii) the system is able to treat differently points from different classes.
In particular, we formally prove in the longer version that DNN with random Gaussian weights perform a distance-preserving embedding of the data, with a special treatment for in-class and out-of-class data.
Two complementary approaches have been extensively used in signal and image processing leading to novel results, the sparse representation methodology and the variational strategy.