When pretrained on Objects365, D-FINE-L / X attains 57. 1% / 59. 3% AP, surpassing all existing real-time detectors.
Ranked #1 on Real-Time Object Detection on MS COCO (using extra training data)
The recently developed retrieval-augmented generation (RAG) technology has enabled the efficient construction of domain-specific applications.
First, we generate and release an improved version of the Agriculture-Vision dataset (Chiu et al., 2020b) to include raw, full-field imagery for greater experimental flexibility.
Tasks involving multiple text-rich images are especially challenging, as they require not only understanding the content of individual images but reasoning about inter-relationships and logical flows across multiple visual inputs.
In this paper, we identify the key issue as the redundant content in videos.
We propose an effective method for inserting adapters into text-to-image foundation models, which enables the execution of complex downstream tasks while preserving the generalization ability of the base model.
This sampling strategy for flow step can be easily applied to existing flow matching based models without retraining.
More crucially, we design the transport map for restoration as a two-pass DA-RCOT map, in which the transport residual is computed in the first pass and then encoded as multi-scale residual embeddings to condition the second-pass restoration.
Ranked #1 on Unified Image Restoration on BSD68 sigma25
Recently, 3D Gaussian Splatting (3DGS) has revolutionized radiance field reconstruction, manifesting efficient and high-fidelity novel view synthesis.
It can understand visual, auditory, and textual modalities, directly output audio, and support flexible duplex interaction.