no code implementations • 5 Feb 2025 • Yao Wei, Matteo Toso, Pietro Morerio, Michael Ying Yang, Alessio Del Bue
This principle is implemented by synthesizing 3D humans that interact with the objects composing the scene.
no code implementations • 1 Jan 2025 • Kun Li, George Vosselman, Michael Ying Yang
The goal of referring remote sensing image segmentation (RRSIS) is to extract specific pixel-level regions within an aerial image via a natural language expression.
no code implementations • 17 Jun 2024 • Kun Li, Hao Cheng, George Vosselman, Michael Ying Yang
Previous studies have demonstrated impressive performance in extracting a single target mask through interactive segmentation.
no code implementations • 19 Mar 2024 • Yao Wei, Martin Renqiang Min, George Vosselman, Li Erran Li, Michael Ying Yang
Recent progresses have been made in object shape generation with generative models such as diffusion models, which increases the shape fidelity.
1 code implementation • 15 Mar 2024 • Florian Kluger, Eric Brachmann, Michael Ying Yang, Bodo Rosenhahn
A RANSAC estimator guided by a neural network fits these primitives to a depth map.
1 code implementation • 6 Feb 2024 • Kun Li, George Vosselman, Michael Ying Yang
Visual Question Answering (VQA) is a challenging task of predicting the answer to a question about the content of an image.
1 code implementation • 13 Oct 2023 • BiYuan Liu, HuaiXin Chen, Kun Li, Michael Ying Yang
We observe that the current change detection methods struggle with the multitask conflicts between semantic and height change detection tasks.
no code implementations • 31 Aug 2023 • Yao Wei, George Vosselman, Michael Ying Yang
3D building generation with low data acquisition costs, such as single image-to-3D, becomes increasingly important.
1 code implementation • 5 Jul 2023 • Kun Li, George Vosselman, Michael Ying Yang
Interactive image segmentation aims to segment the target from the background with the manual guidance, which takes as input multimodal data such as images, clicks, scribbles, and bounding boxes.
1 code implementation • 2 Apr 2023 • Yuren Cong, Wentong Liao, Bodo Rosenhahn, Michael Ying Yang
Learning similarity between scene graphs and images aims to estimate a similarity score given a scene graph and an image.
1 code implementation • 27 Feb 2023 • Mengmeng Liu, Hao Cheng, Lin Chen, Hellward Broszio, Jiangtao Li, Runjiang Zhao, Monika Sester, Michael Ying Yang
Trajectory prediction for autonomous driving must continuously reason the motion stochasticity of road agents and comply with scene constraints.
1 code implementation • 6 Feb 2023 • Yunshuang Yuan, Hao Cheng, Michael Ying Yang, Monika Sester
Safety is critical for autonomous driving, and one aspect of improving safety is to accurately capture the uncertainties of the perception system, especially knowing the unknown.
no code implementations • 23 Jan 2023 • Kun Li, George Vosselman, Michael Ying Yang
Visual question answering (VQA) is an important and challenging multimodal task in computer vision.
no code implementations • 4 Jan 2023 • Yuren Cong, Martin Renqiang Min, Li Erran Li, Bodo Rosenhahn, Michael Ying Yang
We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions.
no code implementations • 11 Nov 2022 • Yuren Cong, Jinhui Yi, Bodo Rosenhahn, Michael Ying Yang
A semantic scene graph-to-video synthesis framework (SSGVS), based on the pre-trained VSG encoder, VQ-VAE, and auto-regressive Transformer, is proposed to synthesize a video given an initial scene image and a non-fixed number of semantic scene graphs.
1 code implementation • 8 Oct 2022 • Yao Wei, George Vosselman, Michael Ying Yang
Generating a 3D point cloud from a single 2D image is of great importance for 3D scene understanding applications.
1 code implementation • 16 Sep 2022 • Hao Cheng, Mengmeng Liu, Lin Chen, Hellward Broszio, Monika Sester, Michael Ying Yang
This paper proposes an attention-based graph model, named GATraj, which achieves a good balance of prediction accuracy and inference speed.
1 code implementation • 27 Jan 2022 • Yuren Cong, Michael Ying Yang, Bodo Rosenhahn
Different objects in the same scene are more or less related to each other, but only a limited number of these relationships are noteworthy.
1 code implementation • 30 Aug 2021 • Gui-Song Xia, Jian Ding, Ming Qian, Nan Xue, Jiaming Han, Xiang Bai, Michael Ying Yang, Shengyang Li, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, Liangpei Zhang, Qiang Zhou, Chao-hui Yu, Kaixuan Hu, Yingjia Bu, Wenming Tan, Zhe Yang, Wei Li, Shang Liu, Jiaxuan Zhao, Tianzhi Ma, Zi-han Gao, Lingqi Wang, Yi Zuo, Licheng Jiao, Chang Meng, Hao Wang, Jiahao Wang, Yiming Hui, Zhuojun Dong, Jie Zhang, Qianyue Bao, Zixiao Zhang, Fang Liu
This report summarizes the results of Learning to Understand Aerial Images (LUAI) 2021 challenge held on ICCV 2021, which focuses on object detection and semantic segmentation in aerial images.
no code implementations • ICCV 2021 • Sen He, Wentong Liao, Michael Ying Yang, Yi-Zhe Song, Bodo Rosenhahn, Tao Xiang
The generated face image given a target age code is expected to be age-sensitive reflected by bio-plausible transformations of shape and texture, while being identity preserving.
2 code implementations • ICCV 2021 • Yuren Cong, Wentong Liao, Hanno Ackermann, Bodo Rosenhahn, Michael Ying Yang
Compared to the task of scene graph generation from images, it is more challenging because of the dynamic relationships between objects and the temporal dependencies between frames allowing for a richer semantic interpretation.
1 code implementation • CVPR 2021 • Florian Kluger, Hanno Ackermann, Eric Brachmann, Michael Ying Yang, Bodo Rosenhahn
A RANSAC estimator guided by a neural network fits these primitives to 3D features, such as a depth map.
1 code implementation • CVPR 2022 • Kai Hu, Wentong Liao, Michael Ying Yang, Bodo Rosenhahn
Text-to-image synthesis (T2I) aims to generate photo-realistic images which are semantically consistent with the text descriptions.
1 code implementation • CVPR 2021 • Sen He, Wentong Liao, Michael Ying Yang, Yongxin Yang, Yi-Zhe Song, Bodo Rosenhahn, Tao Xiang
We argue that these are caused by the lack of context-aware object and stuff feature encoding in their generators, and location-sensitive appearance representation in their discriminators.
Ranked #1 on
Layout-to-Image Generation
on COCO-Stuff 128x128
1 code implementation • 5 Feb 2021 • Ye Lyu, George Vosselman, Gui-Song Xia, Michael Ying Yang
Semantic segmentation for aerial platforms has been one of the fundamental scene understanding task for the earth observation.
1 code implementation • 19 Dec 2020 • Logambal Madhuanand, Francesco Nex, Michael Ying Yang
Monocular video frames are used for training the deep learning model which learns depth and pose information jointly through two different networks, one each for depth and pose.
no code implementations • 18 Dec 2020 • Yaping Lin, George Vosselman, Yanpeng Cao, Michael Ying Yang
Interpretation of Airborne Laser Scanning (ALS) point clouds is a critical procedure for producing various geo-information products like 3D city models, digital terrain models and land use maps.
no code implementations • 7 Dec 2020 • Fan Wang, Jiangxin Yang, Yanlong Cao, Yanpeng Cao, Michael Ying Yang
Image Super-Resolution (SR) provides a promising technique to enhance the image quality of low-resolution optical sensors, facilitating better-performing target detection and autonomous navigation in a wide range of robotics applications.
no code implementations • 2 Nov 2020 • Michael Ying Yang, Saumya Kumaar, Ye Lyu, Francesco Nex
With the increasing demand of autonomous systems, pixelwise semantic segmentation for visual scene understanding needs to be not only accurate but also efficient for potential real-time applications.
2 code implementations • 30 Oct 2020 • Hao Cheng, Wentong Liao, Xuejiao Tang, Michael Ying Yang, Monika Sester, Bodo Rosenhahn
In our framework, first, the spatial context between agents is explored by using self-attention architectures.
1 code implementation • 22 Jun 2020 • Yang Long, Gui-Song Xia, Shengyang Li, Wen Yang, Michael Ying Yang, Xiao Xiang Zhu, Liangpei Zhang, Deren Li
After reviewing existing benchmark datasets in the research community of RS image interpretation, this article discusses the problem of how to efficiently prepare a suitable benchmark dataset for RS image interpretation.
1 code implementation • 15 Jun 2020 • Hao Cheng, Wentong Liao, Michael Ying Yang, Bodo Rosenhahn, Monika Sester
Trajectory prediction is critical for applications of planning safe future movements and remains challenging even for the next few seconds in urban mixed traffic.
no code implementations • 28 May 2020 • Wentong Liao, Xiang Chen, Jingfeng Yang, Stefan Roth, Michael Goesele, Michael Ying Yang, Bodo Rosenhahn
This strengthens the local feature invariance for the resampled features and enables detecting vehicles in an arbitrary orientation.
2 code implementations • 2 Mar 2020 • Ye Lyu, Michael Ying Yang, George Vosselman, Gui-Song Xia
As the tracker reuses the features from the detector, it is a very light-weighted increment to the detection network.
1 code implementation • 14 Feb 2020 • Hao Cheng, Wentong Liao, Michael Ying Yang, Monika Sester, Bodo Rosenhahn
In inference time, we combine the past context and motion information of the target agent with samplings of the latent variables to predict multiple realistic trajectories in the future.
1 code implementation • ECCV 2020 • Cong Yuren, Hanno Ackermann, Wentong Liao, Michael Ying Yang, Bodo Rosenhahn
Detected objects, their labels and the discovered relations can be used to construct a scene graph which provides an abstract semantic interpretation of an image.
Ranked #8 on
Scene Graph Generation
on Visual Genome
3 code implementations • CVPR 2020 • Florian Kluger, Eric Brachmann, Hanno Ackermann, Carsten Rother, Michael Ying Yang, Bodo Rosenhahn
We present a robust estimator for fitting multiple parametric models of the same form to noisy measurements.
no code implementations • 9 Dec 2019 • Du Chen, Zewei He, Yanpeng Cao, Jiangxin Yang, Yanlong Cao, Michael Ying Yang, Siliang Tang, Yueting Zhuang
Firstly, we proposed a novel Orientation-Aware feature extraction and fusion Module (OAM), which contains a mixture of 1D and 2D convolutional kernels (i. e., 5 x 1, 1 x 5, and 3 x 3) for extracting orientation-aware features.
no code implementations • 30 Sep 2019 • Ye Lyu, George Vosselman, Gui-Song Xia, Michael Ying Yang
In recent years, the task of segmenting foreground objects from background in a video, i. e. video object segmentation (VOS), has received considerable attention.
1 code implementation • 23 Jul 2019 • Florian Kluger, Hanno Ackermann, Michael Ying Yang, Bodo Rosenhahn
The horizon line is an important geometric feature for many image processing and scene understanding tasks in computer vision.
Ranked #1 on
Horizon Line Estimation
on KITTI Horizon
no code implementations • elsevier journal 2019 • Guizhong Fu, Peize Sun a, Wenbin Zhu, Jiangxin Yang, Yanlong Cao, Michael Ying Yang, Yanpeng Cao
Automatic visual recognition of steel surface defects provides critical functionality to facilitate quality control of steel strip production.
no code implementations • 7 Apr 2019 • Dayan Guan, Xing Luo, Yanpeng Cao, Jiangxin Yang, Yanlong Cao, George Vosselman, Michael Ying Yang
In this paper, we propose a novel unsupervised domain adaptation framework for multispectral pedestrian detection, by iteratively generating pseudo annotations and updating the parameters of our designed multispectral pedestrian detector on target domain.
no code implementations • 3 Apr 2019 • Wentong Liao, Cuiling Lan, Wen-Jun Zeng, Michael Ying Yang, Bodo Rosenhahn
We further explore more powerful representations by integrating language prior with the visual context in the transformation for the scene graph generation.
no code implementations • 3 Apr 2019 • Sophie Crommelinck, Mila Koeva, Michael Ying Yang, George Vosselman
The delineation approach to which the evaluation framework is applied, was previously introduced and is substantially improved in this study.
no code implementations • 14 Feb 2019 • Yanpeng Cao, Dayan Guan, Yulun Wu, Jiangxin Yang, Yanlong Cao, Michael Ying Yang
Effective fusion of complementary information captured by multi-modal sensors (visible and infrared cameras) enables robust pedestrian detection under various surveillance situations (e. g. daytime and nighttime).
no code implementations • 26 Oct 2018 • Michael Ying Yang, Wentong Liao, Chun Yang, Yanpeng Cao, Bodo Rosenhahn
The experimental results show that the proposed approach outperforms the state-of-the-art methods and effective in recognizing complex security events.
3 code implementations • 24 Oct 2018 • Ye Lyu, George Vosselman, Gui-Song Xia, Alper Yilmaz, Michael Ying Yang
There already exist several semantic segmentation datasets for comparison among semantic segmentation methods in complex urban scenes, such as the Cityscapes and CamVid datasets, where the side views of the objects are captured with a camera mounted on the driving car.
no code implementations • 25 Jul 2018 • Zhenchao Zhang, Markus Gerke, George Vosselman, Michael Ying Yang
Due to the high cost of laser scanning, we want to explore the potential of using point clouds derived by dense image matching (DIM), as effective alternatives to laser scanning data.
1 code implementation • 25 Jul 2018 • Zhenchao Zhang, George Vosselman, Markus Gerke, Devis Tuia, Michael Ying Yang
Detecting topographic changes in the urban environment has always been an important task for urban planning and monitoring.
no code implementations • 27 Feb 2018 • Dayan Guan, Yanpeng Cao, Jun Liang, Yanlong Cao, Michael Ying Yang
Moreover, we utilized illumination information together with multispectral data to generate more accurate semantic segmentation which are used to boost pedestrian detection accuracy.
no code implementations • 9 Feb 2018 • Lihang Liu, Weiyao Lin, Lisheng Wu, Yong Yu, Michael Ying Yang
This paper addresses the problem of unsupervised domain adaptation on the task of pedestrian detection in crowded scenes.
no code implementations • 9 Feb 2018 • Michael Ying Yang, Matthias Reso, Jun Tang, Wentong Liao, Bodo Rosenhahn
Therefore, we formulate a graphical model to select a proposal stream for each object in which the pairwise potentials consist of the appearance dissimilarity between different streams in the same video and also the similarity between the streams in different videos.
1 code implementation • 9 Feb 2018 • Wentong Liao, Michael Ying Yang, Ni Zhan, Bodo Rosenhahn
Moreover, we trained the model jointly on six different datasets, which differs from common practice - one model is just trained on one dataset and tested also on the same one.
no code implementations • 9 Feb 2018 • Oliver Mueller, Michael Ying Yang, Bodo Rosenhahn
We propose to avoid dependence on a proposal distribution by introducing a slice sampling based PBP algorithm.
no code implementations • 9 Feb 2018 • Michael Ying Yang, Wentong Liao, Yanpeng Cao, Bodo Rosenhahn
In our framework, three levels of video events are connected by Hierarchical Dirichlet Process (HDP) model: low-level visual features, simple atomic activities, and multi-agent interactions.
no code implementations • 22 Jan 2018 • Michael Ying Yang, Wentong Liao, Xinbo Li, Bodo Rosenhahn
Also, the focal loss function is used to substitute for conventional cross entropy loss function in both of the region proposed network and the final classifier.
no code implementations • 16 Nov 2017 • Wentong Liao, Lin Shuai, Bodo Rosenhahn, Michael Ying Yang
Most of the existing works treat this task as a pure visual classification task: each type of relationship or phrase is classified as a relation category based on the extracted visual features.
no code implementations • 18 Sep 2017 • Christoph Reinders, Hanno Ackermann, Michael Ying Yang, Bodo Rosenhahn
These algorithms are usually trained on large datasets consisting of thousands or millions of labeled training examples.
no code implementations • 6 Sep 2017 • Sophie Crommelinck, Michael Ying Yang, Mila Koeva, Markus Gerke, Rohan Bennett, George Vosselman
This study proposes (i) a workflow that automatically extracts candidate cadastral boundaries from UAV orthoimages and (ii) a tool for their semi-automatic processing to delineate final cadastral boundaries.
2 code implementations • 8 Jul 2017 • Florian Kluger, Hanno Ackermann, Michael Ying Yang, Bodo Rosenhahn
We present a novel approach for vanishing point detection from uncalibrated monocular images.
Ranked #3 on
Horizon Line Estimation
on York Urban Dataset
no code implementations • 26 Feb 2017 • Omid Hosseini Jafari, Oliver Groth, Alexander Kirillov, Michael Ying Yang, Carsten Rother
Towards this end we propose a Convolutional Neural Network (CNN) architecture that fuses the state of the state-of-the-art results for depth estimation and semantic labeling.
no code implementations • 24 Jan 2017 • Michael Ying Yang, Hanno Ackermann, Weiyao Lin, Sitong Feng, Bodo Rosenhahn
In this paper, we propose a new framework for segmenting feature-based moving objects under affine subspace model.
no code implementations • 3 Oct 2016 • Omid Hosseini jafari, Michael Ying Yang
We show that our method outperforms the state-of-the-art approaches.
no code implementations • 3 Oct 2016 • Siva Karthik Mustikovela, Michael Ying Yang, Carsten Rother
For state-of-the-art semantic segmentation task, training convolutional neural networks (CNNs) requires dense pixelwise ground truth (GT) labeling, which is expensive and involves extensive human effort.
no code implementations • 19 Sep 2016 • Michael Ying Yang, Wentong Liao, Hanno Ackermann, Bodo Rosenhahn
In contrast to previous methods for extracting support relations, the proposed approach generates more accurate results, and does not require a pixel-wise semantic labeling of the scene.
no code implementations • 16 Sep 2016 • Hanno Ackermann, Michael Ying Yang, Bodo Rosenhahn
If these unknown subspaces are well-separated this algorithm is guaranteed to succeed.
no code implementations • CVPR 2016 • Eric Brachmann, Frank Michel, Alexander Krull, Michael Ying Yang, Stefan Gumhold, Carsten Rother
In recent years, the task of estimating the 6D pose of object instances and complete scenes, i. e. camera localization, from a single input image has received considerable attention.
no code implementations • ICCV 2015 • Alexander Krull, Eric Brachmann, Frank Michel, Michael Ying Yang, Stefan Gumhold, Carsten Rother
This is done by describing the posterior density of a particular object pose with a convolutional neural network (CNN) that compares an observed and rendered image.
no code implementations • 6 Aug 2015 • Saif Dawood Salman Al-Shaikhli, Michael Ying Yang, Bodo Rosenhahn
A sparse representation of both global (region-based) and local (voxel-wise) image information is embedded in a level set formulation to innovate a new cost function.