Interactive retrieval for online fashion shopping provides the ability of changing image retrieval results according to the user feedback.
We have defined two main tasks on this dataset: dense semantic segmentation and multi-class lane-marking prediction.
During training, we minimize joint horizontal and oriented bounding box loss functions, as well as a novel loss that enforces oriented boxes to be rectangular.
Ranked #43 on Object Detection In Aerial Images on DOTA
Most saliency estimation methods aim to explicitly model low-level conspicuity cues such as edges or blobs and may additionally incorporate top-down cues using face or text detection.
Action recognition in videos is a challenging task due to the complexity of the spatio-temporal patterns to model and the difficulty to acquire and learn on large quantities of video data.
We provide quantitative experimental evidence suggesting that (i) modern deep learning algorithms pre-trained on real data behave similarly in real and virtual worlds, and (ii) pre-training on virtual data improves performance.
We quantitatively measure the benefit of our domain adaptation strategy on the KITTI tracking benchmark and on a new dataset (PASCAL-to-KITTI) we introduce to study the domain mismatch problem in MOT.
Our models outperform the state of the art on MIT1003, on which features and classifiers are learned.