Search Results for author: Zhitong Xiong

Found 26 papers, 14 papers with code

Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities

2 code implementations • 22 Mar 2024 • Zhitong Xiong, Yi Wang, Fahong Zhang, Adam J. Stewart, Joëlle Hanna, Damian Borth, Ioannis Papoutsis, Bertrand Le Saux, Gustau Camps-Valls, Xiao Xiang Zhu

The development of foundation models has revolutionized our ability to interpret the Earth's surface using satellite observational data.

Earth Observation

Paper
Code

Causal Graph Neural Networks for Wildfire Danger Prediction

no code implementations • 13 Mar 2024 • Shan Zhao, Ioannis Prapas, Ilektra Karasante, Zhitong Xiong, Ioannis Papoutsis, Gustau Camps-Valls, Xiao Xiang Zhu

In that direction, we propose integrating causality with Graph Neural Networks (GNNs) that explicitly model the causal mechanism among complex variables via graph learning.

Decision Making Graph Learning

Paper
Add Code

ChatEarthNet: A Global-Scale Image-Text Dataset Empowering Vision-Language Geo-Foundation Models

no code implementations • 17 Feb 2024 • Zhenghang Yuan, Zhitong Xiong, Lichao Mou, Xiao Xiang Zhu

In this context, we introduce a global-scale, high-quality image-text dataset for remote sensing, providing natural language descriptions for Sentinel-2 data to facilitate the understanding of satellite imagery for common users.

Earth Observation Semantic Segmentation

Paper
Add Code

Efficient Subseasonal Weather Forecast using Teleconnection-informed Transformers

no code implementations • 31 Jan 2024 • Shan Zhao, Zhitong Xiong, Xiao Xiang Zhu

Subseasonal forecasting, which is pivotal for agriculture, water resource management, and early warning of disasters, faces challenges due to the chaotic nature of the atmosphere.

Weather Forecasting

Paper
Add Code

SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model

1 code implementation • 18 Jan 2024 • Yang Zhan, Zhitong Xiong, Yuan Yuan

Specifically, after projecting RS visual features to the language domain via an alignment layer, they are fed jointly with task-specific instructions into an LLM-based RS decoder to predict answers for RS open-ended tasks.

Instruction Following Language Modelling +2

Paper
Code

One for All: Toward Unified Foundation Models for Earth Vision

no code implementations • 15 Jan 2024 • Zhitong Xiong, Yi Wang, Fahong Zhang, Xiao Xiang Zhu

Current remote sensing foundation models typically specialize in a single modality or a specific spatial resolution range, limiting their versatility for downstream datasets.

Paper
Add Code

Mono3DVG: 3D Visual Grounding in Monocular Images

1 code implementation • 13 Dec 2023 • Yang Zhan, Yuan Yuan, Zhitong Xiong

To foster this task, we propose Mono3DVG-TR, an end-to-end transformer-based network, which takes advantage of both the appearance and geometry information in text embeddings for multi-modal learning and 3D object localization.

Object Object Localization +1

Paper
Code

HTC-DC Net: Monocular Height Estimation from Single Remote Sensing Images

1 code implementation • 28 Sep 2023 • Sining Chen, Yilei Shi, Zhitong Xiong, Xiao Xiang Zhu

To tackle this problem, we propose a method for monocular height estimation from optical imagery, which is currently one of the richest sources of remote sensing data.

regression

Paper
Code

Few-shot Object Detection in Remote Sensing: Lifting the Curse of Incompletely Annotated Novel Objects

1 code implementation • 19 Sep 2023 • Fahong Zhang, Yilei Shi, Zhitong Xiong, Xiao Xiang Zhu

In this context, few-shot object detection (FSOD) has emerged as a promising direction, which aims at enabling the model to detect novel objects with only few of them annotated.

Few-Shot Object Detection object-detection +1

Paper
Code

DeCUR: decoupling common & unique representations for multimodal self-supervision

2 code implementations • 11 Sep 2023 • Yi Wang, Conrad M Albrecht, Nassim Ait Ali Braham, Chenying Liu, Zhitong Xiong, Xiao Xiang Zhu

We propose Decoupling Common and Unique Representations (DeCUR), a simple yet effective method for multimodal self-supervised learning.

Scene Classification Self-Supervised Learning +1

Paper
Code

Exploring Geometric Deep Learning For Precipitation Nowcasting

no code implementations • 11 Sep 2023 • Shan Zhao, Sudipan Saha, Zhitong Xiong, Niklas Boers, Xiao Xiang Zhu

Motivated by this, we explore a geometric deep learning-based temporal Graph Convolutional Network (GCN) for precipitation nowcasting.

Paper
Add Code

Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval

1 code implementation • 24 Aug 2023 • Yuan Yuan, Yang Zhan, Zhitong Xiong

To address this issue, in this work, we investigate the parameter-efficient transfer learning (PETL) method to effectively and efficiently transfer visual-language knowledge from the natural domain to the RS domain on the image-text retrieval task.

Ranked #3 on Cross-Modal Retrieval on RSICD

Image-text matching Retrieval +2

Paper
Code

PolyGNN: Polyhedron-based Graph Neural Network for 3D Building Reconstruction from Point Clouds

1 code implementation • 17 Jul 2023 • Zhaiyu Chen, Yilei Shi, Liangliang Nan, Zhitong Xiong, Xiao Xiang Zhu

We present PolyGNN, a polyhedron-based graph neural network for 3D building reconstruction from point clouds.

Node Classification

Paper
Code

RSSOD-Bench: A large-scale benchmark dataset for Salient Object Detection in Optical Remote Sensing Imagery

no code implementations • 4 Jun 2023 • Zhitong Xiong, Yanfeng Liu, Qi Wang, Xiao Xiang Zhu

We present the RSSOD-Bench dataset for salient object detection (SOD) in optical remote sensing imagery.

object-detection Object Detection +1

Paper
Add Code

GAMUS: A Geometry-aware Multi-modal Semantic Segmentation Benchmark for Remote Sensing Data

1 code implementation • 24 May 2023 • Zhitong Xiong, Sining Chen, Yi Wang, Lichao Mou, Xiao Xiang Zhu

Towards a fair and comprehensive analysis of existing methods, the proposed benchmark consists of 1) a large-scale dataset including co-registered RGB and nDSM pairs and pixel-wise semantic labels; 2) a comprehensive evaluation and analysis of existing multi-modal fusion strategies for both convolutional and Transformer-based networks on remote sensing data.

Ranked #1 on Semantic Segmentation on GAMUS

Segmentation Semantic Segmentation

Paper
Code

SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation

3 code implementations • 13 Nov 2022 • Yi Wang, Nassim Ait Ali Braham, Zhitong Xiong, Chenying Liu, Conrad M Albrecht, Xiao Xiang Zhu

Self-supervised pre-training bears potential to generate expressive representations without human annotation.

Ranked #1 on Multi-Label Image Classification on BigEarthNet (official test set) (using extra training data)

Earth Observation Multi-Label Image Classification +1

151

Paper
Code

RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data

1 code implementation • 23 Oct 2022 • Yang Zhan, Zhitong Xiong, Yuan Yuan

However, the object-level visual grounding on RS images is still under-explored.

Image Captioning Question Answering +4

Paper
Code

EarthNets: Empowering AI in Earth Observation

no code implementations • 10 Oct 2022 • Zhitong Xiong, Fahong Zhang, Yi Wang, Yilei Shi, Xiao Xiang Zhu

Furthermore, a new platform for EO, termed EarthNets, is released to achieve a fair and consistent evaluation of deep learning methods on remote sensing data.

Earth Observation Scene Understanding +1

Paper
Add Code

Doubly Deformable Aggregation of Covariance Matrices for Few-shot Segmentation

1 code implementation • 30 Jul 2022 • Zhitong Xiong, Haopeng Li, Xiao Xiang Zhu

To address this problem, we propose to aggregate the learnable covariance matrices with a deformable 4D Transformer to effectively predict the segmentation map.

Ranked #1 on Few-Shot Semantic Segmentation on FSS-1000 (5-shot)

Few-Shot Semantic Segmentation Segmentation +2

Paper
Code

Disentangled Latent Transformer for Interpretable Monocular Height Estimation

1 code implementation • 17 Jan 2022 • Zhitong Xiong, Sining Chen, Yilei Shi, Xiao Xiang Zhu

Furthermore, a novel unsupervised semantic segmentation task based on height estimation is first introduced in this work.

Unsupervised Semantic Segmentation

Paper
Code

THE Benchmark: Transferable Representation Learning for Monocular Height Estimation

no code implementations • 30 Dec 2021 • Zhitong Xiong, Wei Huang, Jingtao Hu, Xiao Xiang Zhu

Therefore, we propose a new benchmark dataset to study the transferability of height estimation models in a cross-dataset setting.

Representation Learning Transfer Learning

Paper
Add Code

Change Detection Meets Visual Question Answering

1 code implementation • 12 Dec 2021 • Zhenghang Yuan, Lichao Mou, Zhitong Xiong, Xiaoxiang Zhu

In order to provide every user with flexible access to change information and help them better understand land-cover changes, we introduce a novel task: change detection-based visual question answering (CDVQA) on multi-temporal aerial images.

Answer Generation Change Detection +3

Paper
Code

ASK: Adaptively Selecting Key Local Features for RGB-D Scene Recognition

no code implementations • 14 Oct 2021 • Zhitong Xiong, Yuan Yuan, Qi Wang

Discriminative local theme-level and object-level representations can be selected with the DLFS module from the spatially-correlated multi-modal RGB-D features.

feature selection Scene Classification +1

Paper
Add Code

CM-Net: Concentric Mask based Arbitrary-Shaped Text Detection

no code implementations • 30 Nov 2020 • Chuang Yang, Mulin Chen, Zhitong Xiong, Yuan Yuan, Qi Wang

Extensive experiments demonstrate the proposed CM is efficient and robust to fit arbitrary-shaped text instances, and also validate the effectiveness of MPF and constraints loss for discriminative text features recognition.

Text Detection

Paper
Add Code

Variational Context-Deformable ConvNets for Indoor Scene Parsing

no code implementations • CVPR 2020 • Zhitong Xiong, Yuan Yuan, Nianhui Guo, Qi Wang

The main contributions of this work are as follows: 1) a novel VCD module is proposed, which exploits learnable Gaussian kernels to enable feature learning with structured adaptive-context; 2) variational Bayesian probabilistic modeling is introduced for the training of VCD module, which can make it continuous and more stable; 3) a perspective-aware guidance module is designed to take advantage of multi-modal information for RGB-D segmentation.

Ranked #1 on Scene Parsing on Cityscapes test

Scene Parsing Segmentation +1

Paper
Add Code

VSSA-NET: Vertical Spatial Sequence Attention Network for Traffic Sign Detection

no code implementations • 5 May 2019 • Yuan Yuan, Zhitong Xiong, Student Member, Qi. Wang, Senior Member, IEEE

Our contributions are as follows: 1) We propose a multi-resolution feature fusion network architecture which exploits densely connected deconvolution layers with skip connections, and can learn more effective features for the small size object; 2) We frame the traffic sign detection as a spatial sequence classification and regression task, and propose a vertical spatial sequence attention (VSSA) module to gain more context information for better detection performance.

object-detection Object Detection +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.