This paper reveals that large language models (LLMs), despite being trained solely on textual data, are surprisingly strong encoders for purely visual tasks in the absence of language.
Ranked #2 on Question Answering on SQA3D
Our benchmark inherently captures the disappearance and re-appearance of agents, presenting the emergent challenge of forecasting for occluded agents, which is a safety-critical problem yet overlooked by snapshot-based benchmarks.
To further enhance multi-view consistency, we augment the uncertainty network with the global 3D structure optimized by a voxelized neural radiance field (Voxel-NeRF).
It emphasizes spatio-temporal continuity and integrates both past and future reasoning for tracked objects.
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases.
Ranked #3 on 3D Object Detection on waymo cyclist
We employ a simple Kalman filter for trajectory prediction and preserve the tracklet by prediction when the target is not visible.
3D multi-object tracking (MOT) has witnessed numerous novel benchmarks and approaches in recent years, especially those under the "tracking-by-detection" paradigm.
Ranked #1 on 3D Multi-Object Tracking on Waymo Open Dataset
The code and protocols for our benchmark and algorithm are available at https://github. com/TuSimple/LiDAR_SOT/.
Indeed, even the majority of few-shot learning methods rely on a large set of "base classes" for pretraining.