In our model, medical text annotation is introduced to compensate for the quality deficiency in image data.
Ranked #1 on Medical Image Segmentation on MoNuSeg
Human mobility data contains rich but abundant information, which yields to the comprehensive region embeddings for cross domain tasks.
We test language models on our forecasting task and find that performance is far below a human expert baseline.
The ability to separate signal from noise, and reason with clean abstractions, is critical to intelligence.
We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks.
The goal of multi-object tracking (MOT) is detecting and tracking all the objects in a scene, while keeping a unique identifier for each object.
Ranked #1 on Multi-Object Tracking on MOT20 (using extra training data)
Attention-based models trained on protein sequences have demonstrated incredible success at classification and generation tasks relevant for artificial intelligence-driven protein design.
Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision.
Ranked #1 on Image Captioning on nocaps-val-out-domain
The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.
Ranked #13 on Video Object Detection on ImageNet VID