droidlet: modular, heterogenous, multi-modal agents

25 Jan 2021

In recent years, there have been significant advances in building end-to-end Machine Learning (ML) systems that learn at scale.

374
1.46 stars / hour

U$^2$-Net: Going Deeper with Nested U-Structure for Salient Object Detection

18 May 2020

In this paper, we design a simple yet powerful deep network architecture, U$^2$-Net, for salient object detection (SOD).

257
1.00 stars / hour

Open-World Entity Segmentation

29 Jul 2021

We introduce a new image segmentation task, termed Entity Segmentation (ES) with the aim to segment all visual entities in an image without considering semantic category labels.

97
0.87 stars / hour

Zero-Shot Text-to-Image Generation

24 Feb 2021

Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset.

180
0.86 stars / hour

YOLOX: Exceeding YOLO Series in 2021

18 Jul 2021

In this report, we present some experienced improvements to YOLO series, forming a new high-performance detector -- YOLOX.

2,929
0.80 stars / hour

Contextual Transformer Networks for Visual Recognition

26 Jul 2021

Such design fully capitalizes on the contextual information among input keys to guide the learning of dynamic attention matrix and thus strengthens the capacity of visual representation.

140
0.74 stars / hour

Epistemic Neural Networks

19 Jul 2021

All existing approaches to uncertainty modeling can be expressed as ENNs, and any ENN can be identified with a Bayesian neural network.

94
0.42 stars / hour

Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

18 May 2021

However, we demonstrate that formalized fairness metrics and quantitative analysis on their own are insufficient for capturing the risk of representational harm in automatic cropping.

139
0.41 stars / hour

From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection

30 Jul 2021

To this end, in this work, we regard point clouds as hollow-3D data and propose a new architecture, namely Hallucinated Hollow-3D R-CNN ($\text{H}^2$3D R-CNN), to address the problem of 3D object detection.

15
0.38 stars / hour

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

26 Jul 2021

Derived regions are consistent across different images and coincide with human-defined semantic classes on some datasets.

59
0.34 stars / hour