Existing shadow detection datasets often contain missing or mislabeled shadows, which can hinder the performance of deep learning models trained directly on such data.
This paper presents CLIPXPlore, a new framework that leverages a vision-language model to guide the exploration of the 3D shape space.
Besides, we propose B-Seg, a building instance segmentation method to establish UrbanBIS.
3D scene understanding, e. g., point cloud semantic and instance segmentation, often requires large-scale annotated training data, but clearly, point-wise labels are too tedious to prepare.
The core of our approach is a two-stage feature-space alignment strategy that leverages a pre-trained single-view reconstruction (SVR) model to map CLIP features to shapes: to begin with, map the CLIP image feature to the detail-rich 3D shape space of the SVR model, then map the CLIP text feature to the 3D shape space through encouraging the CLIP-consistency between rendered images and the input text.
This paper presents a new approach for 3D shape generation, inversion, and manipulation, through a direct generative modeling on a continuous implicit representation in wavelet domain.
The key of Cart is to utilize the prediction of object structures to connect visual observations with user commands for effective manipulations.
First, we decouple hand mesh reconstruction into two branches, one to exploit finger-level non-occluded information and the other to exploit global hand orientation, with lightweight structures to promote real-time inference.
First, we design SSIS-Track, a new framework to extract shadow-object associations in videos with paired tracking and without category specification; especially, we strive to maintain paired tracking even the objects/shadows are temporarily occluded for several frames.
Importantly, we formulate the lightweight plug-in S2D module and the point cloud reconstruction module in SDet to densify 3D features and train SDet to produce 3D features, following the dense 3D features in DDet.
To reconstruct a new urban scene, we first build the 3D scene proxy, then rely on the predicted reconstruction quality and uncertainty measures by our network, based off of the proxy geometry, to guide the drone path planning.
This paper presents a new approach for 3D shape generation, enabling direct generative modeling on a continuous implicit representation in wavelet domain.
Text-guided 3D shape generation remains challenging due to the absence of large paired text-shape data, the substantial semantic gap between these two modalities, and the structural complexity of 3D shapes.
This paper formulates a new problem, instance shadow detection, which aims to detect shadow instance and the associated object instance that cast each shadow in the input image.
Despite the quality improvement brought by the recent methods, video super-resolution (SR) is still very challenging, especially for videos that are low-light and noisy.
To boost a detector for single-frame 3D object detection, we present a new approach to train it to simulate features and responses following a detector trained on multi-frame point clouds.
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector.
To guide the network quickly fit the coarse shape, we propose to utilize the signed supervision in regions that are obviously outside the object and can be easily determined, resulting in our semi-signed supervision.
Industrial bin picking is a challenging task that requires accurate and robust segmentation of individual object instances.
To achieve a learnable self-embedding scheme, we design a novel framework with two jointly-trained networks: one to encode the input point set into its self-embedded sparse point set and the other to leverage the embedded information for inverting the original point set back.
We explore the way to alleviate the label-hungry problem in a semi-supervised setting for 3D instance segmentation.
To address the high cost and challenges of 3D point-level labeling, we present a method for semi-supervised point cloud semantic segmentation to adopt unlabeled point clouds in training to boost the model performance.
We present SP-GAN, a new unsupervised sphere-guided generative model for direct synthesis of 3D shapes in the form of point clouds.
Video prediction methods generally consume substantial computing resources in training and deployment, among which keypoint-based approaches show promising improvement in efficiency by simplifying dense image prediction to light keypoint prediction.
Ranked #1 on Video Prediction on KTH
The ability to recognize the position and order of the floor-level lines that divide adjacent building floors can benefit many applications, for example, urban augmented reality (AR).
Instance shadow detection aims to find shadow instances paired with the objects that cast the shadows.
Ranked #1 on Instance Shadow Detection on SOBA
Lastly, to better exploit hard targets, we design an ODIoU loss to supervise the student with constraints on the predicted box centers and orientations.
Ranked #1 on Birds Eye View Object Detection on KITTI Cars Easy
First, we distill 3D knowledge from a pretrained 3D network to supervise a 2D network to learn simulated 3D features from 2D features during the training, so the 2D network can infer without requiring 3D data.
Point cloud semantic segmentation often requires largescale annotated training data, but clearly, point-wise labels are too tedious to prepare.
We present a novel attention-based mechanism to learn enhanced point features for point cloud processing tasks, e. g., classification and segmentation.
Existing single-stage detectors for locating objects in point clouds often treat object localization and category classification as separate tasks, so the localization accuracy and classification confidence may not well align.
Ranked #3 on Birds Eye View Object Detection on KITTI Cars Easy
Our DoFE framework dynamically enriches the image features with additional domain prior knowledge learned from multi-source domains to make the semantic features more discriminative.
To this end, we present a new domain generalization framework that learns how to generalize across domains simultaneously from extrinsic relationship supervision and intrinsic self-supervision for images from multi-source domains.
To start, we reformulate tiling as a graph problem by modeling candidate tile locations in the target shape as graph nodes and connectivity between tile locations as edges.
Instance segmentation is an important task for scene understanding.
Ranked #5 on 3D Instance Segmentation on STPLS3D
Recently, many deep neural networks were designed to process 3D point clouds, but a common drawback is that rotation invariance is not ensured, leading to poor generalization to arbitrary orientations.
This paper presents a novel non-local part-aware deep neural network to denoise point clouds by exploring the inherent non-local self-similarity in 3D objects and scenes.
We present PointAugment, a new auto-augmentation framework that automatically optimizes and augments point cloud samples to enrich the data diversity when we train a classification network.
Ranked #2 on 3D Point Cloud Data Augmentation on ModelNet40
Then, we pair up the predicted shadow and object instances, and match them with the predicted shadow-object associations to generate the final results.
Ranked #2 on Instance Shadow Detection on SOBA
Shadow detection in general photos is a nontrivial problem, due to the complexity of the real world.
In this paper, we present a novel cross-disease attention network (CANet) to jointly grade DR and DME by exploring the internal relationship between the diseases with only image-level supervision.
To incorporate point features in the edge branch, we establish a hierarchical graph framework, where the graph is initialized from a coarse layer and gradually enriched along the point decoding process.
Ranked #33 on Semantic Segmentation on S3DIS Area5
Besides walls and rooms, we aim to recognize diverse floor plan elements, such as doors, windows and different types of rooms, in the floor layouts.
A hierarchical network is trained using a dataset with over 30K lasso-selection records on two different point cloud data.
Human-Computer Interaction Graphics
We design a novel uncertainty-aware scheme to enable the student model to gradually learn from the meaningful and reliable targets by exploiting the uncertainty information.
Mutually leveraging both low-level feature sharing and high-level prediction correlating, our MTRCNet-CL method can encourage the interactions between the two tasks to a large extent, and hence can bring about benefits to each other.
Ranked #3 on Surgical tool detection on Cholec80
However, the importance of feature embedding, i. e., exploring the relationship among training samples, is neglected.
Rare diseases have extremely low-data regimes, unlike common diseases with large amount of available labeled data.
The cross-domain discrepancy (domain shift) hinders the generalization of deep neural networks to work on different domain datasets. In this work, we present an unsupervised domain adaptation framework, called Boundary and Entropy-driven Adversarial Learning (BEAL), to improve the OD and OC segmentation performance, especially on the ambiguous boundary regions.
This paper presents a new method for shadow removal using unpaired data, enabling us to avoid tedious annotations and obtain more diverse training samples.
This paper presents a new deep neural network design for salient object detection by maximizing the integration of local and global image context within, around, and beyond the salient objects.
In this paper, we present a novel semi-supervised method for medical image segmentation, where the network is optimized by the weighted combination of a common supervised loss for labeled inputs only and a regularization loss for both labeled and unlabeled data.
In this paper, we present a novel patchbased Output Space Adversarial Learning framework (pOSAL) to jointly and robustly segment the OD and OC from different fundus image datasets.
Ranked #2 on Optic Disc Segmentation on REFUGE
6 code implementations • 13 Jan 2019 • Patrick Bilic, Patrick Christ, Hongwei Bran Li, Eugene Vorontsov, Avi Ben-Cohen, Georgios Kaissis, Adi Szeskin, Colin Jacobs, Gabriel Efrain Humpire Mamani, Gabriel Chartrand, Fabian Lohöfer, Julian Walter Holch, Wieland Sommer, Felix Hofmann, Alexandre Hostettler, Naama Lev-Cohain, Michal Drozdzal, Michal Marianne Amitai, Refael Vivantik, Jacob Sosna, Ivan Ezhov, Anjany Sekuboyina, Fernando Navarro, Florian Kofler, Johannes C. Paetzold, Suprosanna Shit, Xiaobin Hu, Jana Lipková, Markus Rempfler, Marie Piraud, Jan Kirschke, Benedikt Wiestler, Zhiheng Zhang, Christian Hülsemeyer, Marcel Beetz, Florian Ettlinger, Michela Antonelli, Woong Bae, Míriam Bellver, Lei Bi, Hao Chen, Grzegorz Chlebus, Erik B. Dam, Qi Dou, Chi-Wing Fu, Bogdan Georgescu, Xavier Giró-i-Nieto, Felix Gruen, Xu Han, Pheng-Ann Heng, Jürgen Hesser, Jan Hendrik Moltz, Christian Igel, Fabian Isensee, Paul Jäger, Fucang Jia, Krishna Chaitanya Kaluva, Mahendra Khened, Ildoo Kim, Jae-Hun Kim, Sungwoong Kim, Simon Kohl, Tomasz Konopczynski, Avinash Kori, Ganapathy Krishnamurthi, Fan Li, Hongchao Li, Junbo Li, Xiaomeng Li, John Lowengrub, Jun Ma, Klaus Maier-Hein, Kevis-Kokitsi Maninis, Hans Meine, Dorit Merhof, Akshay Pai, Mathias Perslev, Jens Petersen, Jordi Pont-Tuset, Jin Qi, Xiaojuan Qi, Oliver Rippel, Karsten Roth, Ignacio Sarasua, Andrea Schenk, Zengming Shen, Jordi Torres, Christian Wachinger, Chunliang Wang, Leon Weninger, Jianrong Wu, Daguang Xu, Xiaoping Yang, Simon Chun-Ho Yu, Yading Yuan, Miao Yu, Liping Zhang, Jorge Cardoso, Spyridon Bakas, Rickmer Braren, Volker Heinemann, Christopher Pal, An Tang, Samuel Kadoury, Luc Soler, Bram van Ginneken, Hayit Greenspan, Leo Joskowicz, Bjoern Menze
In this work, we report the set-up and results of the Liver Tumor Segmentation Benchmark (LiTS), which was organized in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI) 2017 and the International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2017 and 2018.
Second, we develop a bidirectional feature pyramid network (BFPN) to aggregate shadow contexts spanned across different CNN layers by deploying two series of RAR modules in the network to iteratively combine and refine context features: one series to refine context features from deep to shallow layers, and another series from shallow to deep layers.
Ranked #3 on Shadow Detection on SBU
In this paper, we present a novel semi-supervised method for skin lesion segmentation, where the network is optimized by the weighted combination of a common supervised loss for labeled inputs only and a regularization loss for both labeled and unlabeled data.
In this paper, we present the first deep learning based edge-aware technique to facilitate the consolidation of point clouds.
Our best model achieves 77. 23\%(JA) on the test dataset, outperforming the state-of-the-art challenging methods and further demonstrating the effectiveness of our proposed deeply supervised rotation equivariant segmentation network.
This paper presents a novel deep neural network design for shadow detection and removal by analyzing the spatial image context in a direction-aware manner.
Ranked #6 on Shadow Removal on ISTD
Learning and analyzing 3D point clouds with deep networks is challenging due to the sparseness and irregularity of the data.
Ranked #3 on Point Cloud Super Resolution on SHREC15
To achieve this, we first formulate the direction-aware attention mechanism in a spatial recurrent neural network (RNN) by introducing attention weights when aggregating spatial context features in the RNN.
Ranked #2 on RGB Salient Object Detection on SBU
Our method outperformed other state-of-the-arts on the segmentation results of tumors and achieved very competitive performance for liver segmentation even with a single model.
Ranked #1 on Liver Segmentation on LiTS2017 (Dice metric)