From an engineering perspective, MCDD significantly advances automated coral detection in challenging underwater conditions, providing a reliable solution for monitoring marine ecosystems.
Ranked #1 on
2D Object Detection
on SCoralDet Dataset
(using extra training data)
In this paper, we introduce Latent Bridge Matching (LBM), a new, versatile and scalable method that relies on Bridge Matching in a latent space to achieve fast image-to-image translation.
Since we did not change the overall training framework of SyncNet, our experience can also be applied to other lip sync and audio-driven portrait animation methods that utilize SyncNet.
Normalization layers are ubiquitous in modern neural networks and have long been considered essential.
We present HourVideo, a benchmark dataset for hour-long video-language understanding.
We present Next-Scale Autoregression Conditioned by View (ArchonView), a method that significantly exceeds state-of-the-art methods despite being trained from scratch with 3D rendering data only and no 2D pretraining.
Large language model (LLM) agents need to perform multi-turn interactions in real-world tasks.
Object detection and segmentation are widely employed in computer vision applications, yet conventional models like YOLO series, while efficient and accurate, are limited by predefined categories, hindering adaptability in open scenarios.
However, the diagnostic accuracy and specificity of existing heuristic-based RAG models used in the medical domain are inadequate, particularly for diseases with similar manifestations.
Alternatively, zeroth-order (ZO) techniques can compute gradients using just forward operations, eliminating the need to store activations.