LidarMultiNet is extensively tested on both Waymo Open Dataset and nuScenes dataset, demonstrating for the first time that major LiDAR perception tasks can be unified in a single strong network that is trained end-to-end and achieves state-of-the-art performance.
It then uses the feature of the center candidate as the query embedding in the transformer.
Ranked #1 on 3D Object Detection on waymo vehicle
This technical report presents the 1st place winning solution for the Waymo Open Dataset 3D semantic segmentation challenge 2022.
We propose a fusion algorithm for haze removal that combines color information from an RGB image and edge information extracted from its corresponding NIR image using Haar wavelets.
With the explosive growth of livestream broadcasting, there is an urgent need for new summarization technology that enables us to create a preview of streamed content and tap into this wealth of knowledge.
Panoptic segmentation presents a new challenge in exploiting the merits of both detection and segmentation, with the aim of unifying instance segmentation and semantic segmentation in a single framework.
Deep neural network based object detection hasbecome the cornerstone of many real-world applications.
The need for fine-grained perception in autonomous driving systems has resulted in recently increased research on online semantic segmentation of single-scan LiDAR.
Ranked #14 on 3D Semantic Segmentation on SemanticKITTI
In order to come up with a better representation and capturing of long term spatio-temporal relationships, we propose three variants of Self-Attention Network (SAN), namely, SAN-V1, SAN-V2 and SAN-V3.
Ranked #46 on Skeleton Based Action Recognition on NTU RGB+D
Emerged as one of the best performing techniques for extractive summarization, determinantal point processes select the most probable set of sentences to form a summary according to a probability measure defined by modeling sentence prominence and pairwise repulsion.
Instead, the regularizing effects of assuming prior over parameters is seen through maximizing probabilities of models or according to information theory, minimizing the information content of a model.
We introduce a computationally-efficient CNN micro-architecture Slim Module to design a lightweight deep neural network Slim-Net for face attribute prediction.
The video based CNN works have focused on effective ways to fuse appearance and motion networks, but they typically lack utilizing temporal information over video frames.
The most important obstacles facing multi-document summarization include excessive redundancy in source descriptions and the looming shortage of training data.
In particular, we learn a camouflage pattern to hide vehicles from being detected by state-of-the-art convolutional neural network based detectors.
Hence, we propose a curriculum-style learning approach to minimizing the domain gap in urban scene semantic segmentation.
Ranked #24 on Image-to-Image Translation on SYNTHIA-to-Cityscapes
One of the main approaches that is explored in the literature to tackle the problems of size and dimensionality is sampling subsets of the data in order to estimate the characteristics of the whole population, e. g. estimating the underlying clusters or structures in the data.
In other words, ComDefend can transform the adversarial image to its clean version, which is then fed to the trained classifier.
We propose a new way of thinking about deep neural networks, in which the linear and non-linear components of the network are naturally derived and justified in terms of principles in probability theory.
(ii) We demonstrate that delayed association is better suited for clustering subspaces that have ambiguities, i. e. when subspaces intersect or data are contaminated with outliers/noise.
We propose a new deep learning approach for automatic detection and segmentation of fluid within retinal OCT images.
In this work, we present a method for improving a random sample consensus (RANSAC) based image segmentation algorithm by encapsulating it within a convolutional neural network (CNN).
In this paper, we show that different body parts do not play equally important roles in recognizing a human action in video data.
At the root of all the above problems is the lack of efficient run-time solution to the nontrivial problem of rotating wavelets (a non-linear phase-shift), which we solve in this paper.
In order to reconstruct a high-spatial/high-spectral resolution multispectral image volume, either the information in MS and PAN images are fused (i. e. pansharpening) or super-resolution reconstruction (SRR) is used with only MS images captured on different dates.
We propose a novel motion estimation/compensation (ME/MC) method for wavelet-based (in-band) motion compensated temporal filtering (MCTF), with application to low-bitrate video coding.
We first map the input static image to a new domain that we refer to as the Predicted Optical Flow-Saliency Map domain (POF-SM), and then fine-tune the layers of a deep CNN model trained on classifying the ImageNet dataset to perform action classification in the POF-SM domain.
We propose a novel point of view for multiview SRIR: Unlike existing multiview methods that reconstruct the entire spectrum of the HR image from the multiple given LR images, we derive explicit expressions that show how the high-frequency spectra of the unknown HR image are related to the spectra of the LR images.
In view of these new emerging needs for applications of wavelet encoded imaging, we propose a sub-pixel registration method that can achieve direct wavelet domain registration from a sparse set of coefficients.
A topological subdivisioning is adopted to reduce the connection between the input channels and output channels.
Deep neural networks have achieved remarkable performance in both image classification and object detection problems, at the cost of a large number of parameters and computational complexity.
To extract their internal dynamics, we devised a novel Two-Phase Decomposition (TP-Decomp) of a tensor that generates very compact and discriminative representations that are robust to even heavily perturbed data.