In this way, we can exploit more contextual information to detect lanes while leveraging local detailed lane features to improve localization accuracy.
Ranked #1 on Lane Detection on CULane
This network is learned by minimizing our newly-proposed 3D alignment loss between the 3D box estimates and the corresponding RoI LiDAR points.
In this paper, the VLAD aggregation method is adopted to quantize local features with visual vocabulary locally partitioning the feature space, and hence preserve the local discriminability.
Sign language translation as a kind of technology with profound social significance has attracted growing researchers' interest in recent years.
This characteristic indicates that monocular 3D detection is inherently different from other typical detection tasks that have the same dimensional input and output.
In this work, inspired by the fact that users make their click decisions mostly based on the visual impression they perceive when browsing news, we propose to capture such visual impression information with visual-semantic modeling for news recommendation.
Lip reading, aiming to recognize spoken sentences according to the given video of lip movements without relying on the audio stream, has attracted great interest due to its application in many scenarios.
On the one hand, CEL blends each embedding with multiple patches of different scales, providing the self-attention module itself with cross-scale features.
Ranked #26 on Semantic Segmentation on ADE20K val
Few-shot learning (FSL) aims to learn a classifier that can be easily adapted to accommodate new tasks not seen during training, given only a few examples.
In this paper, we study the Salient Object Ranking (SOR) task, which manages to assign a ranking order of each detected object according to its visual saliency.
In this paper, we propose a DiscRiminative-gEnerative duAl Memory (DREAM) anomaly detection model to take advantage of a few anomalies and solve data imbalance.
3D object detection algorithms for autonomous driving reason about 3D obstacles either from 3D birds-eye view or perspective view or both.
In this paper, we argue that a Mutual Nearest Neighbor (MNN) relation should be established to explicitly select the query descriptors that are most relevant to each task and discard less relevant ones from aggregative clutters in FSL.
In this paper, we propose a novel network, Erasing-Salient Net (ES-Net), to learn comprehensive features by erasing the salient areas in an image.
Specifically, it first casts the relationships between a certain model's accuracy and depth/width/resolution into a polynomial regression and then maximizes the polynomial to acquire the optimal values for the three dimensions.
Therefore, it is critical to learn an apparel-invariant person representation under cases like cloth changing or several persons wearing similar clothes.
Most deep-learning-based image classification methods assume that all samples are generated under an independent and identically distributed (IID) setting.
Moreover, based on the PACF module, we propose a 3D multi-sensor multi-task network called Pointcloud-Image RCNN(PI-RCNN as brief), which handles the image segmentation and 3D object detection tasks.
In this work, we collect abundant relationships from common user behaviors and item information, and propose a novel framework named IntentGC to leverage both explicit preferences and heterogeneous relationships by graph convolutional networks.
In this paper, we transfer knowledge learned from machine comprehension to the sequence-to-sequence tasks to deepen the understanding of the text.
We observe that people usually use some discourse markers such as "so" or "but" to represent the logical relationship between two sentences.
Ranked #12 on Natural Language Inference on SNLI
Concretely, we first develop a hierarchical convolutional self-attention encoder to efficiently model long-form video contents, which builds the hierarchical structure for video sequences and captures question-aware long-range dependencies from video context.
2) Cross-layer filter comparison is unachievable since the importance is defined locally within each layer.
Dialogue Act Recognition (DAR) is a challenging problem in dialogue interpretation, which aims to attach semantic labels to utterances and characterize the speaker's intention.
In this paper, we propose a novel neural network system that consists a Demand Optimization Model based on a passage-attention neural machine translation and a Reader Model that can find the answer given the optimized question.
Machine Comprehension (MC) is a challenging task in Natural Language Processing field, which aims to guide the machine to comprehend a passage and answer the given question.
Ranked #34 on Question Answering on SQuAD1.1 dev
Molecules can be represented as an undirected graph, and we can utilize graph convolution networks to predication molecular properties.
Ranked #1 on Drug Discovery on HIV dataset
Machine comprehension(MC) style question answering is a representative problem in natural language processing.
Ranked #19 on Question Answering on TriviaQA
By noting that sparse SVMs induce sparsities in both feature and sample spaces, we propose a novel approach, which is based on accurate estimations of the primal and dual optima of sparse SVMs, to simultaneously identify the inactive features and samples that are guaranteed to be irrelevant to the outputs.
Based on our theoretical analysis, we propose to first learn the gradient field of the distance function and then learn the distance function itself.
Traditional algorithms for stochastic optimization require projecting the solution at each iteration into a given domain to ensure its feasibility.
MTVFL has the following key properties: (1) the vector fields we learned are close to the gradient fields of the prediction functions; (2) within each task, the vector field is required to be as parallel as possible which is expected to span a low dimensional subspace; (3) the vector fields from all tasks share a low dimensional subspace.
To achieve this goal, we show that the second order smoothness measures the linearity of the function, and the gradient field of a linear function has to be a parallel vector field.
In GNMF, an affinity graph is constructed to encode the geometrical information and we seek a matrix factorization, which respects the graph structure.