# Trending Research

Ordered by accumulated GitHub stars in last 3 days
##### Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer networks have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. As a solution, we propose a novel neural architecture, \textit{Transformer-XL}, that enables Transformer to learn dependency beyond a fixed length without disrupting temporal coherence.
298
5.03 stars / hour
##### EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning
The edge generator hallucinates edges of the missing region (both regular and irregular) of the image, and the image completion network fills in the missing regions using hallucinated edges as a priori. We evaluate our model end-to-end over the publicly available datasets CelebA, Places2, and Paris StreetView, and show that it outperforms current state-of-the-art techniques quantitatively and qualitatively.
139
3.16 stars / hour
##### PyOD: A Python Toolbox for Scalable Outlier Detection
PyOD is an open-source Python toolbox for performing scalable outlier detection on multivariate data. Uniquely, it provides access to a wide range of outlier detection algorithms, including established outlier ensembles and more recent neural network-based approaches, under a single, well-documented API designed for use by both practitioners and researchers.

919
1.74 stars / hour
##### ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
The Super-Resolution Generative Adversarial Network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. To further enhance the visual quality, we thoroughly study three key components of SRGAN - network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN).

639
1.06 stars / hour
##### Dataset Distillation
Model distillation aims to distill the knowledge of a complex model into a simpler one. In this paper, we consider an alternative formulation called dataset distillation: we keep the model fixed and instead attempt to distill the knowledge from a large training dataset into a small one.
122
1.03 stars / hour
##### Object Detection with Pixel Intensity Comparisons Organized in Decision Trees
We describe a method for visual object detection based on an ensemble of optimized decision trees organized in a cascade of rejectors. The trees use pixel intensity comparisons in their internal nodes and this makes them able to process image regions very fast.

1,392
0.94 stars / hour
##### EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning
The edge generator hallucinates edges of the missing region (both regular and irregular) of the image, and the image completion network fills in the missing regions using hallucinated edges as a priori. We evaluate our model end-to-end over the publicly available datasets CelebA, Places2, and Paris StreetView, and show that it outperforms current state-of-the-art techniques quantitatively and qualitatively.
544
0.94 stars / hour
##### Neural Ordinary Differential Equations
Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed.
1,482
0.47 stars / hour
##### LanczosNet: Multi-Scale Deep Graph Convolutional Networks
We propose the Lanczos network (LanczosNet), which uses the Lanczos algorithm to construct low rank approximations of the graph Laplacian for graph convolution. Relying on the tridiagonal decomposition of the Lanczos algorithm, we not only efficiently exploit multi-scale information via fast approximated computation of matrix power but also design learnable spectral filters.

80
0.47 stars / hour
##### DeepTraffic: Crowdsourced Hyperparameter Tuning of Deep Reinforcement Learning Systems for Multi-Agent Dense Traffic Navigation
We present a traffic simulation named DeepTraffic where the planning systems for a subset of the vehicles are handled by a neural network as part of a model-free, off-policy reinforcement learning process. The primary goal of DeepTraffic is to make the hands-on study of deep reinforcement learning accessible to thousands of students, educators, and researchers in order to inspire and fuel the exploration and evaluation of deep Q-learning network variants and hyperparameter configurations through large-scale, open competition.

860
0.43 stars / hour
##### BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers.
10,869
0.41 stars / hour
##### Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
The success of deep learning in vision can be attributed to: (a) models with high capacity; (b) increased computational power; and (c) availability of large-scale labeled data. What will happen if we increase the dataset size by 10x or 100x?
1,954
0.34 stars / hour
##### Tencent ML-Images: A Large-Scale Multi-Label Image Database for Visual Representation Learning
In existing visual representation learning tasks, deep convolutional neural networks (CNNs) are often trained on images annotated with single tags, such as ImageNet. In this work, we propose to train CNNs from images annotated with multiple tags, to enhance the quality of visual representation of the trained CNN model.
1,954
0.34 stars / hour
##### TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards.
118,393
0.34 stars / hour
##### models
Models and examples built with TensorFlow

47,049
0.30 stars / hour
##### Listen, Attend and Spell
Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly. Our system has two components: a listener and a speller.

111
0.29 stars / hour
##### End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results
We replace the Hidden Markov Model (HMM) which is traditionally used in in continuous speech recognition with a bi-directional recurrent neural network encoder coupled to a recurrent neural network decoder that directly emits a stream of phonemes. The alignment between the input and output sequences is established using an attention mechanism: the decoder emits each symbol based on a context created with a subset of input symbols elected by the attention mechanism.

111
0.29 stars / hour
##### Exploring the Limits of Language Modeling
In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language.
111
0.29 stars / hour
##### EANet: Enhancing Alignment for Cross-Domain Person Re-identification
Person re-identification (ReID) has achieved significant improvement under the single-domain setting. Finally, we show that applying our PS constraint to unlabeled target domain images serves as effective domain adaptation.

142
0.27 stars / hour
##### Speaker-Follower Models for Vision-and-Language Navigation
Navigation guided by natural language instructions presents a challenging reasoning problem for instruction followers. We use this speaker model to (1) synthesize new instructions for data augmentation and to (2) implement pragmatic reasoning, which evaluates how well candidate action sequences explain an instruction.
18
0.27 stars / hour
##### Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments. This challenging task demands that the agent be aware of which instruction was completed, which instruction is needed next, which way to go, and its navigation progress towards the goal.

18
0.27 stars / hour
##### SGM: Sequence Generation Model for Multi-label Classification
Existing methods tend to ignore the correlations between labels. Further analysis of experimental results demonstrates that the proposed methods not only capture the correlations between labels, but also select the most informative words automatically when predicting different labels.

161
0.26 stars / hour
##### Anytime Stereo Image Depth Estimation on Mobile Devices
Many real-world applications of stereo depth estimation in robotics require the generation of accurate disparity maps in real time under significant computational constraints. Current state-of-the-art algorithms can either generate accurate but slow mappings, or fast but inaccurate ones, and typically require far too many parameters for power- or memory-constrained devices.

41
0.25 stars / hour
##### Deep Residual Learning for Image Recognition
We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
10,990
0.24 stars / hour
##### SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels
We present Spline-based Convolutional Neural Networks (SplineCNNs), a variant of deep neural networks for irregular structured and geometric input, e.g., graphs or meshes. Our main contribution is a novel convolution operator based on B-splines, that makes the computation time independent from the kernel size due to the local support property of the B-spline basis functions.
1,031
0.23 stars / hour
##### GAN Dissection: Visualizing and Understanding Generative Adversarial Networks
Then, we quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output. We show several practical applications enabled by our framework, from comparing internal representations across different layers, models, and datasets, to improving GANs by locating and removing artifact-causing units, to interactively manipulating objects in a scene.

960
0.23 stars / hour
##### Flow: Architecture and Benchmarking for Reinforcement Learning in Traffic Control
Flow is a new computational framework, built to support a key need triggered by the rapid growth of autonomy in ground traffic: controllers for autonomous vehicles in the presence of complex nonlinear dynamics in traffic. Leveraging recent advances in deep Reinforcement Learning (RL), Flow enables the use of RL methods such as policy gradient for traffic control and enables benchmarking the performance of classical (including hand-designed) controllers with learned policies (control laws).

141
0.23 stars / hour
##### Consistent Individualized Feature Attribution for Tree Ensembles
Interpreting predictions from tree ensemble methods such as gradient boosting machines and random forests is important, yet feature attribution for trees is often heuristic and not individualized for each prediction. Here we show that popular feature attribution methods are inconsistent, meaning they can lower a feature's assigned importance when the true impact of that feature actually increases.

3,278
0.22 stars / hour
##### Neural Nearest Neighbors Networks
To exploit our relaxation, we propose the neural nearest neighbors block (N3 block), a novel non-local processing layer that leverages the principle of self-similarity and can be used as building block in modern neural network architectures. We show its effectiveness for the set reasoning task of correspondence classification as well as for image restoration, including image denoising and single image super-resolution, where we outperform strong convolutional neural network (CNN) baselines and recent non-local models that rely on KNN selection in hand-chosen features spaces.
141
0.21 stars / hour
##### Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks
Rotation-invariant face detection, i.e. detecting faces with arbitrary rotation-in-plane (RIP) angles, is widely required in unconstrained applications but still remains as a challenging task, due to the large variations of face appearances. To address this problem more efficiently, we propose Progressive Calibration Networks (PCN) to perform rotation-invariant face detection in a coarse-to-fine manner.
714
0.21 stars / hour
##### Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images.
245
0.21 stars / hour
##### Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform
In this paper, we show that it is possible to recover textures faithful to semantic classes. In particular, we only need to modulate features of a few intermediate layers in a single network conditioned on semantic segmentation probability maps.
245
0.21 stars / hour
##### ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
The Super-Resolution Generative Adversarial Network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. To further enhance the visual quality, we thoroughly study three key components of SRGAN - network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN).

245
0.21 stars / hour
##### DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks
Despite a rapid rise in the quality of built-in smartphone cameras, their physical limitations - small sensor size, compact lenses and the lack of specific hardware, - impede them to achieve the quality results of DSLR cameras. In this work we present an end-to-end deep learning approach that bridges this gap by translating ordinary photos into DSLR-quality images.
762
0.20 stars / hour
Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection.
9,648
0.20 stars / hour
##### FIGR: Few-shot Image Generation with Reptile
Generative Adversarial Networks (GAN) boast impressive capacity to generate realistic images. However, like much of the field of deep learning, they require an inordinate amount of data to produce results, thereby limiting their usefulness in generating novelty.
42
0.20 stars / hour
##### SEGAN: Speech Enhancement Generative Adversarial Network
The majority of them tackle a limited number of noise conditions and rely on first-order statistics. In contrast to current techniques, we operate at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them.

40
0.20 stars / hour
##### Language and Noise Transfer in Speech Enhancement Generative Adversarial Network
This makes the adaptability of those systems into new, low resource environments an important topic. In this work, we present the results of adapting a speech enhancement generative adversarial network by finetuning the generator with small amounts of data.

40
0.20 stars / hour
##### Whispered-to-voiced Alaryngeal Speech Conversion with Generative Adversarial Networks
Most methods of voice restoration for patients suffering from aphonia either produce whispered or monotone speech. Apart from intelligibility, this type of speech lacks expressiveness and naturalness due to the absence of pitch (whispered speech) or artificial generation of it (monotone speech).

40
0.20 stars / hour
##### Auto-Keras: Efficient Neural Architecture Search with Network Morphism
Neural architecture search (NAS) has been proposed to automatically tune deep neural networks, but existing search algorithms usually suffer from expensive computational cost. Network morphism, which keeps the functionality of a neural network while changing its neural architecture, could be helpful for NAS by enabling a more efficient training during the search.
4,267
0.19 stars / hour
##### Attention Is All You Need
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism.
1,862
0.19 stars / hour
##### A Structured Self-attentive Sentence Embedding
This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence.

1,862
0.19 stars / hour
##### dopamine
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
6,957
0.18 stars / hour
##### Scene Text Detection and Recognition: The Deep Learning Era
As an important research area in computer vision, scene text detection and recognition has been inescapably influenced by this wave of revolution, consequentially entering the era of deep learning. In recent years, the community has witnessed substantial advancements in mindset, approach and performance.

166
0.17 stars / hour
##### MatchZoo: A Toolkit for Deep Text Matching
In recent years, deep neural models have been widely adopted for text matching tasks, such as question answering and information retrieval, showing improved performance as compared with previous methods. In this paper, we introduce the MatchZoo toolkit that aims to facilitate the designing, comparing and sharing of deep text matching models.

1,972
0.17 stars / hour
##### Detectron
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
18,576
0.17 stars / hour
##### Video-to-Video Synthesis
We study the problem of video-to-video synthesis, whose goal is to learn a mapping function from an input source video (e.g., a sequence of semantic segmentation masks) to an output photorealistic video that precisely depicts the content of the source video. Without understanding temporal dynamics, directly applying existing image synthesis approaches to an input video often results in temporally incoherent videos of low visual quality.
5,721
0.16 stars / hour
##### Semi-Supervised Classification with Graph Convolutional Networks
We present a scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs. We motivate the choice of our convolutional architecture via a localized first-order approximation of spectral graph convolutions.

2,143
0.16 stars / hour
##### Revisiting Semi-Supervised Learning with Graph Embeddings
We present a semi-supervised learning framework based on graph embeddings. Given a graph between instances, we train an embedding for each instance to jointly predict the class label and the neighborhood context in the graph.

2,143
0.16 stars / hour
##### Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
In this work, we are interested in generalizing convolutional neural networks (CNNs) from low-dimensional regular grids, where image, video and speech are represented, to high-dimensional irregular domains, such as social networks, brain connectomes or words' embedding, represented by graphs. We present a formulation of CNNs in the context of spectral graph theory, which provides the necessary mathematical background and efficient numerical schemes to design fast localized convolutional filters on graphs.
2,143
0.16 stars / hour
##### Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition
Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. Quaternion numbers and quaternion neural networks have shown their efficiency to process multidimensional inputs as entities, to encode internal dependencies, and to solve many tasks with less learning parameters than real-valued models.

44
0.16 stars / hour
##### Quaternion Recurrent Neural Networks
Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence. Nonetheless, popular tasks such as speech or images recognition, involve multi-dimensional input features that are characterized by strong internal dependencies between the dimensions of the input vector.

44
0.16 stars / hour
##### Neural Arithmetic Logic Units
Neural networks can learn to represent and manipulate numerical information, but they seldom generalize well outside of the range of numerical values encountered during training. To encourage more systematic numerical extrapolation, we propose an architecture that represents numerical quantities as linear activations which are manipulated using primitive arithmetic operators, controlled by learned gates.
18
0.16 stars / hour
##### Generative Adversarial Source Separation
Generative source separation methods such as non-negative matrix factorization (NMF) or auto-encoders, rely on the assumption of an output probability density. Generative Adversarial Networks (GANs) can learn data distributions without needing a parametric assumption on the output density.

23
0.16 stars / hour
##### Enriching Word Vectors with Subword Information
Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. A vector representation is associated to each character $n$-gram; words being represented as the sum of these representations.

17,166
0.15 stars / hour
##### FastText.zip: Compressing text classification models
We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory. After considering different solutions inspired by the hashing literature, we propose a method built upon product quantization to store word embeddings.

17,166
0.15 stars / hour
##### Bag of Tricks for Efficient Text Classification
This paper explores a simple and efficient baseline for text classification. Our experiments show that our fast text classifier fastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation.

17,166
0.15 stars / hour
##### openpose
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
10,656
0.15 stars / hour
##### Memory-Efficient Implementation of DenseNets
The DenseNet architecture is highly computationally efficient as a result of feature reuse. A 264-layer DenseNet (73M parameters), which previously would have been infeasible to train, can now be trained on a single workstation with 8 NVIDIA Tesla M40 GPUs.
797
0.14 stars / hour
##### Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks
Learning when to communicate and doing that effectively is essential in multi-agent tasks. Recent works show that continuous communication allows efficient training with back-propagation in multi-agent scenarios, but have been restricted to fully-cooperative tasks.

25
0.14 stars / hour
##### AllenNLP: A Deep Semantic Natural Language Processing Platform
This paper describes AllenNLP, a platform for research on deep learning methods in natural language understanding. AllenNLP is designed to support researchers who want to build novel language understanding models quickly and easily.

4,981
0.14 stars / hour
##### Fully Supervised Speaker Diarization
In this paper, we propose a fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN). Given extracted speaker-discriminative embeddings (a.k.a.

700
0.14 stars / hour
##### McTorch, a manifold optimization library for deep learning
In this paper, we introduce McTorch, a manifold optimization library for deep learning that extends PyTorch. It aims to lower the barrier for users wishing to use manifold constraints in deep learning applications, i.e., when the parameters are constrained to lie on a manifold.

45
0.13 stars / hour
##### Horovod: fast and easy distributed deep learning in TensorFlow
Training modern deep learning models requires large amounts of computation, often provided by GPUs. Depending on the particular methods employed, this communication may entail anywhere from negligible to significant overhead.

4,953
0.13 stars / hour
##### CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers
Systems such as TensorFlow and Caffe2 train models with parallel synchronous stochastic gradient descent: they process a batch of training data at a time, partitioned across GPUs, and average the resulting partial gradients to obtain an updated global model. Our experiments show that CROSSBOW improves the training time of deep learning models on an 8-GPU server by 1.3-4x compared to TensorFlow.

13
0.13 stars / hour
##### Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. Our goal is to learn a mapping $G: X \rightarrow Y$ such that the distribution of images from $G(X)$ is indistinguishable from the distribution $Y$ using an adversarial loss.
91
0.13 stars / hour
##### wav2letter++: The Fastest Open-source Speech Recognition System
This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency.

3,188
0.12 stars / hour
##### ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
In this paper, we present \emph{ProxylessNAS} that can \emph{directly} learn the architectures for large-scale target tasks and target hardware platforms. We address the high memory consumption issue of differentiable NAS and reduce the computational cost (GPU hours and GPU memory) to the same level of regular training while still allowing a large candidate set.
253
0.12 stars / hour