|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories.
Ranked #1 on Zero-Shot Transfer Image Classification on aYahoo
Point cloud is an important type of geometric data structure.
Ranked #2 on Scene Segmentation on ScanNet
Compared to YOLOv2 on the MS-COCO object detection, ESPNetv2 delivers 4. 4% higher accuracy with 6x fewer FLOPs.
Ranked #30 on Object Detection on PASCAL VOC 2007
By following the principles of self-supervision, we build a convolutional neural network (CNN) that can be trained to solve Jigsaw puzzles as a pretext task, which requires no manual labeling, and then later repurposed to solve object classification and detection.
Since the output of event cameras is fundamentally different from conventional cameras, it is commonly accepted that they require the development of specialized algorithms to accommodate the particular nature of events.
We analyze key properties of the approach that make it work, finding that the contrastive loss outperforms a popular alternative based on cross-view prediction, and that the more views we learn from, the better the resulting representation captures underlying scene semantics.
Ranked #28 on Self-Supervised Image Classification on ImageNet
This work transfers concepts such as residual/dense connections and dilated convolutions from CNNs to GCNs in order to successfully train very deep GCNs.
Ranked #4 on Node Classification on PPI
In this paper, we address the problem of reducing the memory footprint of convolutional network architectures.
Object detection in aerial images is an active yet challenging task in computer vision because of the bird's-eye view perspective, the highly complex backgrounds, and the variant appearances of objects.
We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild.