In this work, we re-introduce this information as a new type of input data for trajectory forecasting systems: the local behavior data, which we conceptualize as a collection of location-specific historical trajectories.
Further, we apply the proposed framework to current SOTA multi-agent multi-modal forecasting systems as a plugin module, which enables the SOTA systems to 1) estimate the uncertainty in the multi-agent multi-modal trajectory forecasting task; 2) rank the multiple predictions and select the optimal one based on the estimated uncertainty.
Point-NeRF combines the advantages of these two approaches by using neural 3D point clouds, with associated neural features, to model a radiance field.
Finally, the probability of occupancy is also integrated into a proposal refinement module to generate the final bounding boxes.
Ranked #2 on 3D Object Detection on KITTI Cars Moderate
The majority of prior monocular depth estimation methods without groundtruth depth guidance focus on driving scenarios.
2) The results of trajectory forecasting benchmarks demonstrate that the CU-based framework steadily helps SOTA systems improve their performances.
Our synergy process leverages a representation cycle for 3DMM parameters and 3D landmarks.
Ranked #1 on Head Pose Estimation on AFLW2000
This work focuses on the analysis that whether 3D face models can be learned from only the speech inputs of speakers.
This work focuses on complete 3D facial geometry prediction, including 3D facial alignment via 3D face modeling and face orientation estimation using the proposed multi-task, multi-modal, and multi-representation landmark refinement network (M$^3$-LRN).
Mask regression is based on 2D, 2. 5D, and 3D ROI using the pseudo-lidar and image-based representations.
Ranked #1 on Instance Segmentation on Cityscapes val (using extra training data)
Recent sparse depth completion for lidars only focuses on the lower scenes and produces irregular estimations on the upper because existing datasets, such as KITTI, do not provide groundtruth for upper areas.
Compared with popular sampling methods such as Farthest Point Sampling (FPS) and Ball Query, CAGQ achieves up to 50X speed-up.
Such a transformation enables CFCNet to predict features and reconstruct data of missing depth measurements according to their corresponding, transformed RGB features.
Also, we propose to create building masks from semantic segmentation using an encoder-decoder network.
Reconstructing 3D shapes from single-view images has been a long-standing research problem.
Ranked #1 on Single-View 3D Reconstruction on ShapeNetCore
Given such a source 3D model and a target which can be a 2D image, 3D model, or a point cloud acquired as a depth scan, we introduce 3DN, an end-to-end network that deforms the source model to resemble the target.
In this paper, we propose the multi-domain dictionary learn- ing (MDDL) to make dictionary learning-based classification more robust to data representing in different domains.
In this paper, we introduce a stochastic dynamics video infilling (SDVI) framework to generate frames between long intervals in a video.
Convolutional neural networks (CNN) are limited by the lack of capability to handle geometric information due to the fixed grid kernel structure.
Ranked #5 on Semantic Segmentation on Stanford2D3D
Specifically, we introduce a "try-and-learn" algorithm to train pruning agents that remove unnecessary CNN filters in a data-driven way.
Experimental results on various 3D scenes show the effectiveness of our method on 3D instance segmentation, and we also evaluate the capability of SGPN to improve 3D object detection and semantic segmentation results.
Ranked #1 on 3D Semantic Instance Segmentation on ScanNetV1
The 3D-ED-GAN is a 3D convolutional neural network trained with a generative adversarial paradigm to fill missing 3D data in low-resolution.
A novel neural network architecture is built for scene labeling tasks where one of the variants of the new RNN unit, Gated Recurrent Unit with Explicit Long-range Conditioning (GRU-ELC), is used to model multi scale contextual dependencies in images.