In our model, medical text annotation is introduced to compensate for the quality deficiency in image data.
Ranked #1 on Medical Image Segmentation on MoNuSeg
The goal of multi-object tracking (MOT) is detecting and tracking all the objects in a scene, while keeping a unique identifier for each object.
Ranked #1 on Multi-Object Tracking on MOT20 (using extra training data)
In this work, we present a text-driven controllable framework, Text2Human, for a high-quality and diverse human generation.
We test language models on our forecasting task and find that performance is far below a human expert baseline.
Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision.
Ranked #1 on Image Captioning on nocaps-val-out-domain
The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.
Ranked #13 on Video Object Detection on ImageNet VID
We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks.
Our EdgeNeXt model with 1. 3M parameters achieves 71. 2\% top-1 accuracy on ImageNet-1K, outperforming MobileViT with an absolute gain of 2. 2\% with 28\% reduction in FLOPs.
Ranked #37 on Semantic Segmentation on PASCAL VOC 2012 test