Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics.

Finally, we design a BaB framework, named Branch and Dual Network Bound (BaDNB), based on our novel bounding and branching algorithms.

Experiments on CIFAR-10 against $\ell_2$ and $\ell_\infty$ norm-bounded perturbations demonstrate that BYORL achieves near state-of-the-art robustness with as little as 500 labeled examples.

Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.

We provide experimental results on the ColorMnist and CelebA benchmark datasets that quantify the properties of the learned representations and compare the approach with a baseline that is specifically trained for the desired property.

We provide experimental results on the ColorMnist and CelebA benchmark datasets that quantify the properties of the learned representations and compare the approach with a baseline that is specifically trained for the desired property.

In this paper, we introduce ReSWAT (Resilient Signal Watermarking via Adversarial Training), a framework for learning transformation-resilient watermark detectors that are able to detect a watermark even after a signal has been through several post-processing transformations.

From this perspective, we hypothesise that instabilities in training GANs arise from the integration error in discretising the continuous dynamics.

In this work, we propose a first-order dual SDP algorithm that (1) requires memory only linear in the total number of network activations, (2) only requires a fixed number of forward/backward passes through the network per iteration.

In the setting with additional unlabeled data, we obtain an accuracy under attack of 65. 88% against $\ell_\infty$ perturbations of size $8/255$ on CIFAR-10 (+6. 35% with respect to prior art).

Reliable detection of out-of-distribution (OOD) inputs is increasingly understood to be a precondition for deployment of machine learning systems.

This is notable because our system is not a bespoke system designed specifically to solve intelligence tests, but a general-purpose system that was designed to make sense of any sensory sequence.

We study the problem of learning efficient algorithms that strongly generalize in the framework of neural program induction.

Neural networks are widely used in Natural Language Processing, yet despite their empirical successes, their behaviour is brittle: they are both over-sensitive to small input changes, and under-sensitive to deletions of large fractions of input text.

Deep reinforcement learning has achieved great success in many previously difficult reinforcement learning tasks, yet recent studies show that deep RL agents are also unavoidably susceptible to adversarial perturbations, similar to deep neural networks in classification tasks.

Formal verification techniques that compute provable guarantees on properties of machine learning models, like robustness to norm-bounded adversarial perturbations, have yielded impressive results.

This paper studies the undesired phenomena of over-sensitivity of representations learned by deep networks to semantically-irrelevant changes in data.

Both the algorithms offer three advantages: (i) they yield bounds that are provably at least as tight as previous dual algorithms relying on Lagrangian relaxations; (ii) they are based on operations analogous to forward/backward pass of neural networks layers and are therefore easily parallelizable, amenable to GPU implementation and able to take advantage of the convolutional structure of problems; and (iii) they allow for anytime stopping while still providing valid bounds.

Specifically, we leverage the disentangled latent representations computed by a StyleGAN model to generate perturbations of an image that are similar to real-world variations (like adding make-up, or changing the skin-tone of a person) and train models to be invariant to these perturbations.

This paper aims to quantify and reduce a particular type of bias exhibited by language models: bias in the sentiment of generated text.

We propose a `learning to explore' framework where we learn a policy from a distribution of environments.

Adversarial testing methods based on Projected Gradient Descent (PGD) are widely used for searching norm-bounded perturbations that cause the inputs of neural networks to be misclassified.

This is notable because our system is not a bespoke system designed specifically to solve intelligence tests, but a general-purpose system that was designed to make sense of any sensory sequence.

While these models thrive on the perception-based task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual), suggesting that a principled approach for causal reasoning should incorporate the capability of both perceiving complex visual and language inputs, and understanding the underlying dynamics and causal relations.

We use the data sets to conduct a thorough experimental comparison of existing and new algorithms and to provide an inclusive analysis of the factors impacting the hardness of verification problems.

Neural networks are part of many contemporary NLP systems, yet their empirical successes come at the price of vulnerability to adversarial attacks.

Using this regularizer, we exceed current state of the art and achieve 47% adversarial accuracy for ImageNet with l-infinity adversarial perturbations of radius 4/255 under an untargeted, strong, white-box attack.

Recent work has uncovered the interesting (and somewhat surprising) finding that training models to be invariant to adversarial perturbations requires substantially larger datasets than those required for standard classification.

Medical imaging only indirectly measures the molecular identity of the tissue within each voxel, which often produces only ambiguous image evidence for target measures of interest, like semantic segmentation.

We present a deep reinforcement learning approach to minimizing the execution cost of neural network computation graphs in an optimizing compiler.

Reinforcement learning agents are typically trained and evaluated according to their performance averaged over some distribution of environment settings.

This paper addresses the challenging problem of retrieval and matching of graph structured objects, and makes two key contributions.

This behavior can have severe consequences such as usage of increased computation and induce faults in downstream modules that expect outputs of a certain length.

To bridge the learning of two modules, we use a neuro-symbolic reasoning module that executes these programs on the latent scene representation.

Ranked #5 on Visual Question Answering on CLEVR

Our results show that agents which use structured representations (e. g., objects and scene graphs) and structured policies (e. g., object-centric actions) outperform those which use less structured representations, and generalize better beyond their training when asked to reason about larger scenes.

The structured nature of the mathematics domain, covering arithmetic, algebra, probability and calculus, enables the construction of training and test splits designed to clearly illuminate the capabilities and failure-modes of different architectures, as well as evaluate their ability to compose and relate knowledge and learned processes.

Ranked #2 on Question Answering on Mathematics Dataset

We introduce a unified probabilistic framework for solving sequential decision making problems ranging from Bayesian optimisation to contextual bandits and reinforcement learning.

Machine learning is used extensively in recommender systems deployed in products.

We show that a number of important properties of interest can be modeled within this class, including conservation of energy in a learned dynamics model of a physical system; semantic consistency of a classifier's output labels under adversarial perturbations and bounding errors in a system that predicts the summation of handwritten digits.

Currently the only techniques for sharing governance of a deep learning model are homomorphic encryption and secure multiparty computation.

For example, a machine translation model should produce semantically equivalent outputs for innocuous changes in the input to the model.

We demonstrate this is an issue for current agents, where even matching the compute used for training is sometimes insufficient for evaluation.

We introduce Compositional Imitation Learning and Execution (CompILE): a framework for learning reusable, variable-length segments of hierarchically-structured behavior from demonstration data.

We show that increasing the number of parameters in adversarially-trained models increases their robustness, and in particular that ensembling smaller models while adversarially training the entire ensemble as a single model is a more efficient way of spending said budget than simply using a larger single model.

Recent work has shown that it is possible to train deep neural networks that are provably robust to norm-bounded adversarial perturbations.

Second, the model is more data- and memory-efficient: it performs well after learning on a small number of training data; it can also encode an image into a compact representation, requiring less storage than existing methods for offline question answering.

Ranked #1 on Visual Question Answering on CLEVR

Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions from infrequent environment rewards.

As a companion to this paper, we have released an open-source software library for building graph networks, with demonstrations of how to use them in practice.

We present Value Propagation (VProp), a set of parameter-efficient differentiable planning modules built on Value Iteration which can successfully be trained using reinforcement learning to solve unseen tasks, has the capability to generalize to larger map sizes, and can learn to navigate in dynamic environments.

This paper proposes a new algorithmic framework, predictor-verifier training, to train neural networks that are verifiable, i. e., networks that provably satisfy some desired input-output properties.

The presented algorithms can be applied to any labelling problem using a dense CRF with sparse higher-order potentials.

Program synthesis is the task of automatically generating a program consistent with a specification.

Unlike the popular Deep Reinforcement Learning (DRL) paradigm, which represents policies by neural networks, PIRL represents policies using a high-level, domain-specific programming language.

In contrast, our framework applies to a general class of activation functions and specifications on neural network inputs and outputs.

We introduce a new dataset of logical entailments for the purpose of measuring models' ability to capture and exploit the structure of logical expressions against an entailment prediction task.

We motivate 'adversarial risk' as an objective for achieving models robust to worst-case inputs.

Motivated by the need of accelerating progress in this very important area, we investigate the trade-offs of a number of different approaches based on Mixed Integer Programming, Satisfiability Modulo Theory, as well as a novel method based on the Branch-and-Bound framework.

At the core of our system is a physical world representation that is first recovered by a perception module and then utilized by physics and graphics engines.

The success of Deep Learning and its potential use in many safety-critical applications has motivated research on formal verification of Neural Network (NN) models.

We study the problem of semantic code repair, which can be broadly defined as automatically fixing non-syntactic bugs in source code.

In our first proposal, portfolio adaptation, a set of induction models is pretrained on a set of related tasks, and the best model is adapted towards the new task using transfer learning.

By retargeting the PCA expression geometry from the source, as well as using the newly inferred texture, we can both animate the face and perform video face replacement on the source video using the target appearance.

A neural architecture first transforms a rasterized image to a set of junctions that represent low-level geometric and semantic information (e. g., wall corners or door end-points).

Our approach employs a deterministic rendering function as the decoder, mapping a naturally structured and disentangled scene description, which we named scene XML, to an image.

As a step towards developing zero-shot task generalization capabilities in reinforcement learning (RL), we introduce a new RL problem where the agent should learn to execute sequences of instructions after learning useful skills that solve subtasks.

Bayesian optimization (BO) has become an effective approach for black-box function optimization problems when function evaluations are expensive and the optimum can be achieved within a relatively small number of queries.

We propose to learn such representations using model architectures that generalise from standard VAEs, employing a general graphical model structure in the encoder and decoder.

We then present a novel neural synthesis algorithm to search for programs in the DSL that are consistent with a given set of examples.

Recently, two competing approaches for automatic program learning have received significant attention: (1) neural program synthesis, where a neural network is conditioned on input/output (I/O) examples and learns to generate a program, and (2) neural program induction, where a neural network generates new outputs directly using a latent program representation.

Optimization of high-dimensional black-box functions is an extremely challenging problem.

Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems.

This paper proposes a novel MAP inference framework for Markov Random Field (MRF) in parallel computing environments.

Our result implies that neural networks are effective at perceptual tasks that require long periods of reasoning even for humans to solve.

Superoptimization requires the estimation of the best program for a given computational task.

A TerpreT model is composed of a specification of a program representation and an interpreter that describes how programs map inputs to outputs.

We develop a framework for incorporating structured graphical models in the \emph{encoders} of variational autoencoders (VAEs) that allows us to induce interpretable representations through approximate variational inference.

Gaussian Process bandit optimization has emerged as a powerful tool for optimizing noisy black box functions.

We present a method to improve video description generation by modeling higher-order interactions between video frames and described concepts.

While achieving impressive results, these approaches have a number of important limitations: (a) they are computationally expensive and hard to train, (b) a model has to be trained for each task (program) separately, and (c) it is hard to interpret or verify the correctness of the learnt mapping (as it is defined by a neural network).

This approach involves repeated sampling of modifications to the program from a proposal distribution, which are accepted or rejected based on whether they preserve correctness, and the improvement they achieve.

Combining abstract, symbolic reasoning with continuous neural reasoning is a grand challenge of representation learning.

We introduce a convolutional neural network for inferring a compact disentangled graphical description of objects from 2D images that can be used for volumetric reconstruction.

In contrast to the continuous relaxation-based energy minimisation algorithms used for sparse CRFs, the mean-field algorithm fails to provide strong theoretical guarantees on the quality of its solutions.

TerpreT is similar to a probabilistic programming language: a model is composed of a specification of a program representation (declarations of random variables) and an interpreter describing how programs map inputs to outputs (a model connecting unknowns to observations).

This paper addresses the challenging problem of perceiving the hidden or occluded geometry of the scene depicted in any given RGBD image.

In contrast to conventional approaches that rely on pairwise distance computation, our algorithm isolates distinctive pixel pairs that hit the same leaf during traversal through multiple learned tree structures.

We show that it is possible to compile programs written in a low-level language to a differentiable representation.

We introduce the first dataset for sequential vision-to-language, and explore how this data may be used for the task of visual storytelling.

We created a new corpus of ~50k five-sentence commonsense stories, ROCStories, to enable this evaluation.

We demonstrate the efficacy of our approach on the challenging problem of RGB Camera Relocalization.

In particular, 3D context has been shown to be an extremely important cue for scene understanding - yet very little research has been done on integrating context information with deep models.

In this paper, we show that we can significantly improving upon black box optimization by exploiting high-level knowledge of the structure of the parameters and using a local surrogate energy function.

In this paper, we present an algorithm for optimizing the split functions at all levels of the tree jointly with the leaf parameters, based on a global objective.

To relate the quality of a judgment to the time a worker spends on a task, our model assumes that each task is completed within a latent time window within which all workers with a propensity to genuinely attempt the labelling task (i. e., no spammers) are expected to submit their judgments.

Furthermore, we consider an embedding of the tasks and workers in an underlying graph that may arise from task similarities or social ties, and that can provide additional side-observations for faster learning.

We develop a convex-concave upper bound on the classification loss for a one-level decision tree, and optimize the bound by stochastic gradient descent at each internal node of the tree.

This paper addresses the problem of learning long binary codes from high-dimensional data.

We demonstrate this technique on large retrieval databases, specifically ImageNET, GIST1M and SUN-attribute for the task of nearest neighbor retrieval, and show that our method achieves a speed-up of up to a factor of 100 over state-of-the-art methods, while having on-par and in some cases even better accuracy.

Recent progress on probabilistic modeling and statistical learning, coupled with the availability of large training datasets, has led to remarkable progress in computer vision.

We propose a novel approach to reduce the computational cost of evaluation of convolutional neural networks, a factor that has hindered their deployment in low-power devices such as mobile phones.

How should we gather information in a network, where each node's visibility is limited to its local neighborhood?

This paper presents the Deep Convolution Inverse Graphics Network (DC-IGN), a model that learns an interpretable representation of images.

In this work, we investigate the use of sparsity-inducing regularizers during training of Convolution Neural Networks (CNNs).

Much of research in machine learning has centered around the search for inference algorithms that are both general-purpose and efficient.

Generative models provide a powerful framework for probabilistic reasoning.

We show that it is possible to solve challenging, real-world 3D vision problems by approximate inference in generative models for images based on rendering the outputs of probabilistic CAD (PCAD) programs.

Structured-output learning is a challenging problem; particularly so because of the difficulty in obtaining large datasets of fully labelled instances for training.

We propose 'filter forests' (FF), an efficient new discriminative approach for predicting continuous variables given a signal and its context.

We formulate this problem as inversion of the generative rendering procedure, i. e., we want to find the camera pose corresponding to a rendering of the 3D scene model that is most similar with the observed input.

Human gestures, similar to speech and handwriting, are often unique to the individual.

Randomized decision trees and forests have a rich history in machine learning and have seen considerable success in application, perhaps particularly so for computer vision.

This paper presents a novel meta algorithm, Partition-Merge (PM), which takes existing centralized algorithms for graph computation and makes them distributed and faster.

We show how this constrained discrete optimization problem can be formulated as a multi-dimensional parametric mincut problem via its Lagrangian dual, and prove that our algorithm isolates all constraint instances for which the problem can be solved exactly.

However, for many computer vision problems, the MAP solution under the model is not the ground truth solution.

Conditional Random Fields (CRF) are among the most popular techniques for image labelling because of their flexibility in modelling dependencies between the labels and the image features.

We discuss a model for image segmentation that is able to overcome the short-boundary bias observed in standard pairwise random field based approaches.

Traditional video compression methods obtain a compact representation for image frames by computing coarse motion fields defined on patches of pixels called blocks, in order to compensate for the motion in the scene across frames.

This paper presents a new and efficient forest based model that achieves spatially consistent semantic image segmentation by encoding variable dependencies directly in the feature space the forests operate on.

Experimental results show that the spatial dependencies learned by our method significantly improve the accuracy of segmentation.

In this paper we introduce Context-Sensitive Decision Forests - A new perspective to exploit contextual information in the popular decision forest framework for the object detection problem.

The paper addresses the problem of generating multiple hypotheses for prediction tasks that involve interaction with users or successive components in a cascade.

For many of the state-of-the-art computer vision algorithms, image segmentation is an important preprocessing step.

We consider the question of computing Maximum A Posteriori (MAP) assignment in an arbitrary pair-wise Markov Random Field (MRF).

Cannot find the paper you are looking for? You can
Submit a new open access paper.