Trending Research

Ordered by accumulated GitHub stars in last 3 days
Trending Latest Greatest
1
Card image cap
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer networks have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. As a solution, we propose a novel neural architecture, \textit{Transformer-XL}, that enables Transformer to learn dependency beyond a fixed length without disrupting temporal coherence.
298
5.03 stars / hour
 Paper  Code
2
Card image cap
EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning
The edge generator hallucinates edges of the missing region (both regular and irregular) of the image, and the image completion network fills in the missing regions using hallucinated edges as a priori. We evaluate our model end-to-end over the publicly available datasets CelebA, Places2, and Paris StreetView, and show that it outperforms current state-of-the-art techniques quantitatively and qualitatively.
139
3.16 stars / hour
 Paper  Code
3
Card image cap
PyOD: A Python Toolbox for Scalable Outlier Detection
PyOD is an open-source Python toolbox for performing scalable outlier detection on multivariate data. Uniquely, it provides access to a wide range of outlier detection algorithms, including established outlier ensembles and more recent neural network-based approaches, under a single, well-documented API designed for use by both practitioners and researchers.

919
1.74 stars / hour
 Paper  Code
4
Card image cap
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
The Super-Resolution Generative Adversarial Network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. To further enhance the visual quality, we thoroughly study three key components of SRGAN - network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN).

639
1.06 stars / hour
 Paper  Code
5
Card image cap
Dataset Distillation
Model distillation aims to distill the knowledge of a complex model into a simpler one. In this paper, we consider an alternative formulation called dataset distillation: we keep the model fixed and instead attempt to distill the knowledge from a large training dataset into a small one.
122
1.03 stars / hour
 Paper  Code
6
Card image cap
Object Detection with Pixel Intensity Comparisons Organized in Decision Trees
We describe a method for visual object detection based on an ensemble of optimized decision trees organized in a cascade of rejectors. The trees use pixel intensity comparisons in their internal nodes and this makes them able to process image regions very fast.

1,392
0.94 stars / hour
 Paper  Code
7
Card image cap
EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning
The edge generator hallucinates edges of the missing region (both regular and irregular) of the image, and the image completion network fills in the missing regions using hallucinated edges as a priori. We evaluate our model end-to-end over the publicly available datasets CelebA, Places2, and Paris StreetView, and show that it outperforms current state-of-the-art techniques quantitatively and qualitatively.
544
0.94 stars / hour
 Paper  Code
8
Card image cap
Neural Ordinary Differential Equations
Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed.
1,482
0.47 stars / hour
 Paper  Code
9
Card image cap
LanczosNet: Multi-Scale Deep Graph Convolutional Networks
We propose the Lanczos network (LanczosNet), which uses the Lanczos algorithm to construct low rank approximations of the graph Laplacian for graph convolution. Relying on the tridiagonal decomposition of the Lanczos algorithm, we not only efficiently exploit multi-scale information via fast approximated computation of matrix power but also design learnable spectral filters.

80
0.47 stars / hour
 Paper  Code
10
Card image cap
DeepTraffic: Crowdsourced Hyperparameter Tuning of Deep Reinforcement Learning Systems for Multi-Agent Dense Traffic Navigation
We present a traffic simulation named DeepTraffic where the planning systems for a subset of the vehicles are handled by a neural network as part of a model-free, off-policy reinforcement learning process. The primary goal of DeepTraffic is to make the hands-on study of deep reinforcement learning accessible to thousands of students, educators, and researchers in order to inspire and fuel the exploration and evaluation of deep Q-learning network variants and hyperparameter configurations through large-scale, open competition.

860
0.43 stars / hour
 Paper  Code
11
Card image cap
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers.
10,869
0.41 stars / hour
 Paper  Code
12
Card image cap
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
The success of deep learning in vision can be attributed to: (a) models with high capacity; (b) increased computational power; and (c) availability of large-scale labeled data. What will happen if we increase the dataset size by 10x or 100x?
1,954
0.34 stars / hour
 Paper  Code
13
Card image cap
Tencent ML-Images: A Large-Scale Multi-Label Image Database for Visual Representation Learning
In existing visual representation learning tasks, deep convolutional neural networks (CNNs) are often trained on images annotated with single tags, such as ImageNet. In this work, we propose to train CNNs from images annotated with multiple tags, to enhance the quality of visual representation of the trained CNN model.
1,954
0.34 stars / hour
 Paper  Code
14
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards.
118,393
0.34 stars / hour
 Paper  Code
15
Card image cap
models
Models and examples built with TensorFlow

47,049
0.30 stars / hour
 Paper  Code
16
Card image cap
Listen, Attend and Spell
Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly. Our system has two components: a listener and a speller.

111
0.29 stars / hour
 Paper  Code
17
Card image cap
End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results
We replace the Hidden Markov Model (HMM) which is traditionally used in in continuous speech recognition with a bi-directional recurrent neural network encoder coupled to a recurrent neural network decoder that directly emits a stream of phonemes. The alignment between the input and output sequences is established using an attention mechanism: the decoder emits each symbol based on a context created with a subset of input symbols elected by the attention mechanism.

111
0.29 stars / hour
 Paper  Code
18
Card image cap
Exploring the Limits of Language Modeling
In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language.
111
0.29 stars / hour
 Paper  Code
19
Card image cap
EANet: Enhancing Alignment for Cross-Domain Person Re-identification
Person re-identification (ReID) has achieved significant improvement under the single-domain setting. Finally, we show that applying our PS constraint to unlabeled target domain images serves as effective domain adaptation.

142
0.27 stars / hour
 Paper  Code
20
Card image cap
Speaker-Follower Models for Vision-and-Language Navigation
Navigation guided by natural language instructions presents a challenging reasoning problem for instruction followers. We use this speaker model to (1) synthesize new instructions for data augmentation and to (2) implement pragmatic reasoning, which evaluates how well candidate action sequences explain an instruction.
18
0.27 stars / hour
 Paper  Code
21
Card image cap
Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments. This challenging task demands that the agent be aware of which instruction was completed, which instruction is needed next, which way to go, and its navigation progress towards the goal.

18
0.27 stars / hour
 Paper  Code
22
Card image cap
SGM: Sequence Generation Model for Multi-label Classification
Existing methods tend to ignore the correlations between labels. Further analysis of experimental results demonstrates that the proposed methods not only capture the correlations between labels, but also select the most informative words automatically when predicting different labels.

161
0.26 stars / hour
 Paper  Code
23
Card image cap
Anytime Stereo Image Depth Estimation on Mobile Devices
Many real-world applications of stereo depth estimation in robotics require the generation of accurate disparity maps in real time under significant computational constraints. Current state-of-the-art algorithms can either generate accurate but slow mappings, or fast but inaccurate ones, and typically require far too many parameters for power- or memory-constrained devices.

41
0.25 stars / hour
 Paper  Code
24
Card image cap
Deep Residual Learning for Image Recognition
We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
10,990
0.24 stars / hour
 Paper  Code
25
Card image cap
SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels
We present Spline-based Convolutional Neural Networks (SplineCNNs), a variant of deep neural networks for irregular structured and geometric input, e.g., graphs or meshes. Our main contribution is a novel convolution operator based on B-splines, that makes the computation time independent from the kernel size due to the local support property of the B-spline basis functions.
1,031
0.23 stars / hour
 Paper  Code
26
Card image cap
GAN Dissection: Visualizing and Understanding Generative Adversarial Networks
Then, we quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output. We show several practical applications enabled by our framework, from comparing internal representations across different layers, models, and datasets, to improving GANs by locating and removing artifact-causing units, to interactively manipulating objects in a scene.

960
0.23 stars / hour
 Paper  Code
27
Card image cap
Flow: Architecture and Benchmarking for Reinforcement Learning in Traffic Control
Flow is a new computational framework, built to support a key need triggered by the rapid growth of autonomy in ground traffic: controllers for autonomous vehicles in the presence of complex nonlinear dynamics in traffic. Leveraging recent advances in deep Reinforcement Learning (RL), Flow enables the use of RL methods such as policy gradient for traffic control and enables benchmarking the performance of classical (including hand-designed) controllers with learned policies (control laws).

141
0.23 stars / hour
 Paper  Code
28
Card image cap
Consistent Individualized Feature Attribution for Tree Ensembles
Interpreting predictions from tree ensemble methods such as gradient boosting machines and random forests is important, yet feature attribution for trees is often heuristic and not individualized for each prediction. Here we show that popular feature attribution methods are inconsistent, meaning they can lower a feature's assigned importance when the true impact of that feature actually increases.

3,278
0.22 stars / hour
 Paper  Code
29
Card image cap
Neural Nearest Neighbors Networks
To exploit our relaxation, we propose the neural nearest neighbors block (N3 block), a novel non-local processing layer that leverages the principle of self-similarity and can be used as building block in modern neural network architectures. We show its effectiveness for the set reasoning task of correspondence classification as well as for image restoration, including image denoising and single image super-resolution, where we outperform strong convolutional neural network (CNN) baselines and recent non-local models that rely on KNN selection in hand-chosen features spaces.
141
0.21 stars / hour
 Paper  Code
30
Card image cap
Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks
Rotation-invariant face detection, i.e. detecting faces with arbitrary rotation-in-plane (RIP) angles, is widely required in unconstrained applications but still remains as a challenging task, due to the large variations of face appearances. To address this problem more efficiently, we propose Progressive Calibration Networks (PCN) to perform rotation-invariant face detection in a coarse-to-fine manner.
714
0.21 stars / hour
 Paper  Code
31
Card image cap
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images.
245
0.21 stars / hour
 Paper  Code
32
Card image cap
Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform
In this paper, we show that it is possible to recover textures faithful to semantic classes. In particular, we only need to modulate features of a few intermediate layers in a single network conditioned on semantic segmentation probability maps.
245
0.21 stars / hour
 Paper  Code
33
Card image cap
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
The Super-Resolution Generative Adversarial Network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. To further enhance the visual quality, we thoroughly study three key components of SRGAN - network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN).

245
0.21 stars / hour
 Paper  Code
34
Card image cap
DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks
Despite a rapid rise in the quality of built-in smartphone cameras, their physical limitations - small sensor size, compact lenses and the lack of specific hardware, - impede them to achieve the quality results of DSLR cameras. In this work we present an end-to-end deep learning approach that bridges this gap by translating ordinary photos into DSLR-quality images.
762
0.20 stars / hour
 Paper  Code
35
Mask R-CNN
Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection.
9,648
0.20 stars / hour
 Paper  Code
36
Card image cap
FIGR: Few-shot Image Generation with Reptile
Generative Adversarial Networks (GAN) boast impressive capacity to generate realistic images. However, like much of the field of deep learning, they require an inordinate amount of data to produce results, thereby limiting their usefulness in generating novelty.
42
0.20 stars / hour
 Paper  Code
37
Card image cap
SEGAN: Speech Enhancement Generative Adversarial Network
The majority of them tackle a limited number of noise conditions and rely on first-order statistics. In contrast to current techniques, we operate at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them.

40
0.20 stars / hour
 Paper  Code
38
Card image cap
Language and Noise Transfer in Speech Enhancement Generative Adversarial Network
This makes the adaptability of those systems into new, low resource environments an important topic. In this work, we present the results of adapting a speech enhancement generative adversarial network by finetuning the generator with small amounts of data.

40
0.20 stars / hour
 Paper  Code
39
Card image cap
Whispered-to-voiced Alaryngeal Speech Conversion with Generative Adversarial Networks
Most methods of voice restoration for patients suffering from aphonia either produce whispered or monotone speech. Apart from intelligibility, this type of speech lacks expressiveness and naturalness due to the absence of pitch (whispered speech) or artificial generation of it (monotone speech).

40
0.20 stars / hour
 Paper  Code
40
Card image cap
Auto-Keras: Efficient Neural Architecture Search with Network Morphism
Neural architecture search (NAS) has been proposed to automatically tune deep neural networks, but existing search algorithms usually suffer from expensive computational cost. Network morphism, which keeps the functionality of a neural network while changing its neural architecture, could be helpful for NAS by enabling a more efficient training during the search.
4,267
0.19 stars / hour
 Paper  Code
41
Card image cap
Attention Is All You Need
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism.
1,862
0.19 stars / hour
 Paper  Code
42
Card image cap
A Structured Self-attentive Sentence Embedding
This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence.

1,862
0.19 stars / hour
 Paper  Code
43
Card image cap
dopamine
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
6,957
0.18 stars / hour
 Paper  Code
44
Card image cap
Scene Text Detection and Recognition: The Deep Learning Era
As an important research area in computer vision, scene text detection and recognition has been inescapably influenced by this wave of revolution, consequentially entering the era of deep learning. In recent years, the community has witnessed substantial advancements in mindset, approach and performance.

166
0.17 stars / hour
 Paper  Code
45
Card image cap
MatchZoo: A Toolkit for Deep Text Matching
In recent years, deep neural models have been widely adopted for text matching tasks, such as question answering and information retrieval, showing improved performance as compared with previous methods. In this paper, we introduce the MatchZoo toolkit that aims to facilitate the designing, comparing and sharing of deep text matching models.

1,972
0.17 stars / hour
 Paper  Code
46
Card image cap
Detectron
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
18,576
0.17 stars / hour
 Paper  Code
47
Video-to-Video Synthesis
We study the problem of video-to-video synthesis, whose goal is to learn a mapping function from an input source video (e.g., a sequence of semantic segmentation masks) to an output photorealistic video that precisely depicts the content of the source video. Without understanding temporal dynamics, directly applying existing image synthesis approaches to an input video often results in temporally incoherent videos of low visual quality.
5,721
0.16 stars / hour
 Paper  Code
48
Card image cap
Semi-Supervised Classification with Graph Convolutional Networks
We present a scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs. We motivate the choice of our convolutional architecture via a localized first-order approximation of spectral graph convolutions.

2,143
0.16 stars / hour
 Paper  Code
49
Card image cap
Revisiting Semi-Supervised Learning with Graph Embeddings
We present a semi-supervised learning framework based on graph embeddings. Given a graph between instances, we train an embedding for each instance to jointly predict the class label and the neighborhood context in the graph.

2,143
0.16 stars / hour
 Paper  Code
50
Card image cap
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
In this work, we are interested in generalizing convolutional neural networks (CNNs) from low-dimensional regular grids, where image, video and speech are represented, to high-dimensional irregular domains, such as social networks, brain connectomes or words' embedding, represented by graphs. We present a formulation of CNNs in the context of spectral graph theory, which provides the necessary mathematical background and efficient numerical schemes to design fast localized convolutional filters on graphs.
2,143
0.16 stars / hour
 Paper  Code
51
Card image cap
Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition
Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. Quaternion numbers and quaternion neural networks have shown their efficiency to process multidimensional inputs as entities, to encode internal dependencies, and to solve many tasks with less learning parameters than real-valued models.

44
0.16 stars / hour
 Paper  Code
52
Card image cap
Quaternion Recurrent Neural Networks
Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence. Nonetheless, popular tasks such as speech or images recognition, involve multi-dimensional input features that are characterized by strong internal dependencies between the dimensions of the input vector.

44
0.16 stars / hour
 Paper  Code
53
Card image cap
Neural Arithmetic Logic Units
Neural networks can learn to represent and manipulate numerical information, but they seldom generalize well outside of the range of numerical values encountered during training. To encourage more systematic numerical extrapolation, we propose an architecture that represents numerical quantities as linear activations which are manipulated using primitive arithmetic operators, controlled by learned gates.
18
0.16 stars / hour
 Paper  Code
54
Card image cap
Generative Adversarial Source Separation
Generative source separation methods such as non-negative matrix factorization (NMF) or auto-encoders, rely on the assumption of an output probability density. Generative Adversarial Networks (GANs) can learn data distributions without needing a parametric assumption on the output density.

23
0.16 stars / hour
 Paper  Code
55
Card image cap
Enriching Word Vectors with Subword Information
Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. A vector representation is associated to each character $n$-gram; words being represented as the sum of these representations.

17,166
0.15 stars / hour
 Paper  Code
56
Card image cap
FastText.zip: Compressing text classification models
We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory. After considering different solutions inspired by the hashing literature, we propose a method built upon product quantization to store word embeddings.

17,166
0.15 stars / hour
 Paper  Code
57
Card image cap
Bag of Tricks for Efficient Text Classification
This paper explores a simple and efficient baseline for text classification. Our experiments show that our fast text classifier fastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation.

17,166
0.15 stars / hour
 Paper  Code
58
Card image cap
openpose
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
10,656
0.15 stars / hour
 Paper  Code
59
Card image cap
Memory-Efficient Implementation of DenseNets
The DenseNet architecture is highly computationally efficient as a result of feature reuse. A 264-layer DenseNet (73M parameters), which previously would have been infeasible to train, can now be trained on a single workstation with 8 NVIDIA Tesla M40 GPUs.
797
0.14 stars / hour
 Paper  Code
60
Card image cap
Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks
Learning when to communicate and doing that effectively is essential in multi-agent tasks. Recent works show that continuous communication allows efficient training with back-propagation in multi-agent scenarios, but have been restricted to fully-cooperative tasks.

25
0.14 stars / hour
 Paper  Code
61
Card image cap
AllenNLP: A Deep Semantic Natural Language Processing Platform
This paper describes AllenNLP, a platform for research on deep learning methods in natural language understanding. AllenNLP is designed to support researchers who want to build novel language understanding models quickly and easily.

4,981
0.14 stars / hour
 Paper  Code
62
Card image cap
Fully Supervised Speaker Diarization
In this paper, we propose a fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN). Given extracted speaker-discriminative embeddings (a.k.a.

700
0.14 stars / hour
 Paper  Code
63
Card image cap
McTorch, a manifold optimization library for deep learning
In this paper, we introduce McTorch, a manifold optimization library for deep learning that extends PyTorch. It aims to lower the barrier for users wishing to use manifold constraints in deep learning applications, i.e., when the parameters are constrained to lie on a manifold.

45
0.13 stars / hour
 Paper  Code
64
Card image cap
Horovod: fast and easy distributed deep learning in TensorFlow
Training modern deep learning models requires large amounts of computation, often provided by GPUs. Depending on the particular methods employed, this communication may entail anywhere from negligible to significant overhead.

4,953
0.13 stars / hour
 Paper  Code
65
Card image cap
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers
Systems such as TensorFlow and Caffe2 train models with parallel synchronous stochastic gradient descent: they process a batch of training data at a time, partitioned across GPUs, and average the resulting partial gradients to obtain an updated global model. Our experiments show that CROSSBOW improves the training time of deep learning models on an 8-GPU server by 1.3-4x compared to TensorFlow.

13
0.13 stars / hour
 Paper  Code
66
Card image cap
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. Our goal is to learn a mapping $G: X \rightarrow Y$ such that the distribution of images from $G(X)$ is indistinguishable from the distribution $Y$ using an adversarial loss.
91
0.13 stars / hour
 Paper  Code
67
Card image cap
wav2letter++: The Fastest Open-source Speech Recognition System
This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency.

3,188
0.12 stars / hour
 Paper  Code
68
Card image cap
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
In this paper, we present \emph{ProxylessNAS} that can \emph{directly} learn the architectures for large-scale target tasks and target hardware platforms. We address the high memory consumption issue of differentiable NAS and reduce the computational cost (GPU hours and GPU memory) to the same level of regular training while still allowing a large candidate set.
253
0.12 stars / hour
 Paper  Code