This is a collection of (mostly) pen-and-paper exercises in machine learning.
Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate.
In our model, medical text annotation is introduced to compensate for the quality deficiency in image data.
Ranked #1 on
Medical Image Segmentation
on MoNuSeg
The goal of multi-object tracking (MOT) is detecting and tracking all the objects in a scene, while keeping a unique identifier for each object.
Ranked #1 on
Multi-Object Tracking
on MOT20
(using extra training data)
In this work, we present a text-driven controllable framework, Text2Human, for a high-quality and diverse human generation.
We test language models on our forecasting task and find that performance is far below a human expert baseline.
Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision.
Ranked #1 on
Image Captioning
on nocaps-val-out-domain
The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.
Ranked #13 on
Video Object Detection
on ImageNet VID
We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks.
Our EdgeNeXt model with 1. 3M parameters achieves 71. 2\% top-1 accuracy on ImageNet-1K, outperforming MobileViT with an absolute gain of 2. 2\% with 28\% reduction in FLOPs.
Ranked #37 on
Semantic Segmentation
on PASCAL VOC 2012 test