1 code implementation • 10 Apr 2025 • Bo Zhang, Hui Ma, Dailin Li, Jian Ding, Jian Wang, Bo Xu, Hongfei Lin
Large language models (LLMs) demonstrate remarkable text comprehension and generation capabilities but often lack the ability to utilize up-to-date or domain-specific knowledge not included in their training data.
no code implementations • 16 Dec 2024 • Chang Xu, Ruixiang Zhang, Wen Yang, Haoran Zhu, Fang Xu, Jian Ding, Gui-Song Xia
Detecting oriented tiny objects, which are limited in appearance information yet prevalent in real-world applications, remains an intricate and under-explored problem.
no code implementations • 2 Sep 2024 • Guanyi Chen, Jian Ding, Shuyang Gong, Zhangsong Li
For the detection problem of distinguishing this model from a pair of independent Erd\H{o}s-R\'enyi graphs with the same edge density $\mathcal{G}(n,\tfrac{\lambda s}{n})$, we focus on tests based on \emph{low-degree polynomials} of the entries of the adjacency matrices, and we determine the threshold that separates the easy and hard regimes.
1 code implementation • 17 Jul 2024 • Kirolos Ataallah, Xiaoqian Shen, Eslam Abdelrahman, Essam Sleiman, Mingchen Zhuge, Jian Ding, Deyao Zhu, Jürgen Schmidhuber, Mohamed Elhoseiny
This design of the retrieval mechanism enables the Goldfish to efficiently process arbitrarily long video sequences, facilitating its application in contexts such as movies or television series.
1 code implementation • 28 Jun 2024 • Kirolos Ataallah, Chenhui Gou, Eslam Abdelrahman, Khushbu Pahwa, Jian Ding, Mohamed Elhoseiny
To address this gap, we introduce InfiniBench a comprehensive benchmark for very long video understanding which presents 1)The longest video duration, averaging 52. 59 minutes per video 2) The largest number of question-answer pairs, 108. 2K 3) Diversity in questions that examine nine different skills and include both multiple-choice questions and open-ended questions 4) Human-centric, as the video sources come from movies and daily TV shows, with specific human-level question designs such as Movie Spoiler Questions that require critical thinking and comprehensive understanding.
2 code implementations • 18 Jun 2024 • Xiang Li, Jian Ding, Mohamed Elhoseiny
We introduce a new benchmark designed to advance the development of general-purpose, large-scale vision-language models for remote sensing images.
no code implementations • 10 Jun 2024 • Abdulwahab Felemban, Eslam Mohamed BAKR, Xiaoqian Shen, Jian Ding, Abduallah Mohamed, Mohamed Elhoseiny
We introduce iMotion-LLM: a Multimodal Large Language Models (LLMs) with trajectory prediction, tailored to guide interactive multi-agent scenarios.
no code implementations • 29 May 2024 • Junjie Fei, Mahmoud Ahmed, Jian Ding, Eslam Mohamed BAKR, Mohamed Elhoseiny
Therefore, we propose two novel tasks: (1) Part-Aware Point Grounding, the model is tasked with directly predicting a part-level segmentation mask based on user instructions, and (2) Part-Aware Point Grounded Captioning, the model provides a detailed caption that includes part-level descriptions and their corresponding masks.
1 code implementation • 16 May 2024 • Bo Zhang, Hui Ma, Jian Ding, Jian Wang, Bo Xu, Hongfei Lin
Integrating multimodal knowledge into large language models (LLMs) represents a significant advancement in dialogue generation capabilities.
1 code implementation • 16 May 2024 • Xianzheng Ma, Yash Bhalgat, Brandon Smart, Shuai Chen, Xinghui Li, Jian Ding, Jindong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian, Philip H Torr, Marc Pollefeys, Matthias Nießner, Ian D Reid, Angel X. Chang, Iro Laina, Victor Adrian Prisacariu
Hence, with this paper, we aim to chart a course for future research that explores and expands the capabilities of 3D-LLMs in understanding and interacting with the complex 3D world.
2 code implementations • 4 Apr 2024 • Kirolos Ataallah, Xiaoqian Shen, Eslam Abdelrahman, Essam Sleiman, Deyao Zhu, Jian Ding, Mohamed Elhoseiny
This paper introduces MiniGPT4-Video, a multimodal Large Language Model (LLM) designed specifically for video understanding.
Ranked #3 on
Zero-Shot Video Question Answer
on TVQA
no code implementations • 5 Dec 2023 • Xiang Li, Jian Ding, Zhaoyang Chen, Mohamed Elhoseiny
In this work, we present Uni3DL, a unified model for 3D and Language understanding.
no code implementations • 16 Oct 2023 • Jian Ding, Yumou Fei, Yuanzheng Wang
In this paper, we study the problem of recovering the latent vertex correspondence between two correlated random graphs with vastly inhomogeneous and unknown edge probabilities between different pairs of vertices.
1 code implementation • 13 Sep 2023 • Yaoting Wang, Weisong Liu, Guangyao Li, Jian Ding, Di Hu, Xi Li
Never having seen an object and heard its sound simultaneously, can the model still accurately localize its visual position from the input audio?
1 code implementation • 29 Aug 2023 • Haodong He, Jian Ding, Bowen Xu, Gui-Song Xia
The benchmarks we propose and our comprehensive experimental analyses can facilitate research on robust object detection on aerial images.
1 code implementation • 23 Jul 2023 • Guopeng Li, Yue Xu, Jian Ding, Gui-Song Xia
To this end, we propose a generic white-box attack, LGP (local perturbations with adaptively global attacks), to blind mainstream object detectors with controllable perturbations.
no code implementations • 1 Jun 2023 • Jian Ding, Zhangsong Li
We propose an efficient algorithm for matching two correlated Erd\H{o}s--R\'enyi graphs with $n$ vertices whose edges are correlated through a latent vertex correspondence.
1 code implementation • CVPR 2023 • Jian Ding, Nan Xue, Gui-Song Xia, Bernt Schiele, Dengxin Dai
This work studies semantic segmentation under the domain generalization setting, where a model is trained only on the source domain and tested on the unseen target domain.
1 code implementation • CVPR 2024 • Zhikai Zhang, Jian Ding, Li Jiang, Dengxin Dai, Gui-Song Xia
Based on the point features, we perform a bottom-up multicut algorithm to segment point clouds into coarse instance masks as pseudo labels, which are used to train a point cloud instance segmentation model.
1 code implementation • CVPR 2023 • Chang Xu, Jian Ding, Jinwang Wang, Wen Yang, Huai Yu, Lei Yu, Gui-Song Xia
Despite the exploration of adaptive label assignment in recent oriented object detectors, the extreme geometry shape and limited feature of oriented tiny objects still induce severe mismatch and imbalance issues.
Ranked #4 on
Oriented Object Detection
on DOTA 2.0
1 code implementation • 31 Jan 2023 • Jiaming Han, Yuqiang Ren, Jian Ding, Ke Yan, Gui-Song Xia
As few-shot object detectors are often trained with abundant base samples and fine-tuned on few-shot novel examples, the learned models are usually biased to base classes and sensitive to the variance of novel examples.
1 code implementation • 26 Jan 2023 • Chao Pang, Jiang Wu, Jian Ding, Can Song, Gui-Song Xia
The tilted viewing nature of the off-nadir aerial images brings severe challenges to the building change detection (BCD) problem: the mismatch of the nearby buildings and the semantic ambiguity of the building facades.
Building change detection for remote sensing images
Change Detection
no code implementations • 28 Dec 2022 • Jian Ding, Zhangsong Li
Motivated by the problem of matching vertices in two correlated Erd\H{o}s-R\'enyi graphs, we study the problem of matching two correlated Gaussian Wigner matrices.
no code implementations • 29 May 2022 • Jian Ding, Hang Du
For two correlated graphs which are independently sub-sampled from a common Erd\H{o}s-R\'enyi graph $\mathbf{G}(n, p)$, we wish to recover their \emph{latent} vertex matching from the observation of these two graphs \emph{without labels}.
no code implementations • 28 Mar 2022 • Jian Ding, Hang Du
The problem of detecting edge correlation between two Erd\H{o}s-R\'enyi random graphs on $n$ unlabeled nodes can be formulated as a hypothesis testing problem: under the null hypothesis, the two graphs are sampled independently; under the alternative, the two graphs are independently sub-sampled from a parent graph which is Erd\H{o}s-R\'enyi $\mathbf{G}(n, p)$ (so that their marginal distributions are the same as the null).
1 code implementation • CVPR 2022 • Jiaming Han, Yuqiang Ren, Jian Ding, Xingjia Pan, Ke Yan, Gui-Song Xia
Thus, unknown objects in low-density regions can be easily identified with the learned unknown probability.
1 code implementation • CVPR 2022 • Jian Ding, Nan Xue, Gui-Song Xia, Dengxin Dai
2) a zero-shot classification task on segments.
1 code implementation • 30 Aug 2021 • Gui-Song Xia, Jian Ding, Ming Qian, Nan Xue, Jiaming Han, Xiang Bai, Michael Ying Yang, Shengyang Li, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, Liangpei Zhang, Qiang Zhou, Chao-hui Yu, Kaixuan Hu, Yingjia Bu, Wenming Tan, Zhe Yang, Wei Li, Shang Liu, Jiaxuan Zhao, Tianzhi Ma, Zi-han Gao, Lingqi Wang, Yi Zuo, Licheng Jiao, Chang Meng, Hao Wang, Jiahao Wang, Yiming Hui, Zhuojun Dong, Jie Zhang, Qianyue Bao, Zixiao Zhang, Fang Liu
This report summarizes the results of Learning to Understand Aerial Images (LUAI) 2021 challenge held on ICCV 2021, which focuses on object detection and semantic segmentation in aerial images.
no code implementations • 17 Mar 2021 • Jian Ding, Yihong Wu, Jiaming Xu, Dana Yang
Conversely, if $\sqrt{d} B(\mathcal{P},\mathcal{Q}) \ge 1+\epsilon$ for an arbitrarily small constant $\epsilon>0$, the reconstruction error for any estimator is shown to be bounded away from $0$ under both the sparse and dense model, resolving the conjecture in [Moharrami et al. 2019, Semerjian et al. 2020].
4 code implementations • CVPR 2021 • Jiaming Han, Jian Ding, Nan Xue, Gui-Song Xia
More precisely, we incorporate rotation-equivariant networks into the detector to extract rotation-equivariant features, which can accurately predict the orientation and lead to a huge reduction of model size.
Ranked #21 on
Object Detection In Aerial Images
on DOTA
(using extra training data)
no code implementations • 8 Mar 2021 • Jian Ding, Enze Xie, Hang Xu, Chenhan Jiang, Zhenguo Li, Ping Luo, Gui-Song Xia
Unsupervised pre-training aims at learning transferable features that are beneficial for downstream tasks.
2 code implementations • 24 Feb 2021 • Jian Ding, Nan Xue, Gui-Song Xia, Xiang Bai, Wen Yang, Micheal Ying Yang, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, Liangpei Zhang
In this paper, we present a large-scale Dataset of Object deTection in Aerial images (DOTA) and comprehensive baselines for ODAI.
2 code implementations • ICCV 2021 • Enze Xie, Jian Ding, Wenhai Wang, Xiaohang Zhan, Hang Xu, Peize Sun, Zhenguo Li, Ping Luo
Unlike most recent methods that focused on improving accuracy of image classification, we present a novel contrastive learning approach, named DetCo, which fully explores the contrasts between global image and local image patches to learn discriminative representations for object detection.
3 code implementations • 21 Aug 2020 • Jiaming Han, Jian Ding, Jie Li, Gui-Song Xia
However most of existing methods rely on heuristically defined anchors with different scales, angles and aspect ratios and usually suffer from severe misalignment between anchor boxes and axis-aligned convolutional features, which leads to the common inconsistency between the classification score and localization accuracy.
Ranked #26 on
Object Detection In Aerial Images
on DOTA
(using extra training data)
no code implementations • 6 Mar 2020 • Jiwei Jia, Jian Ding, Siyu Liu, Guidong Liao, Jingzhi Li, Ben Duan, Guoqing Wang, Ran Zhang
Home quarantine is the most important one to prevent the spread of COVID-19.
no code implementations • 18 Nov 2019 • Jian Ding, Yihong Wu, Jiaming Xu, Dana Yang
Motivated by applications such as discovering strong ties in social networks and assembling genome subsequences in biology, we study the problem of recovering a hidden $2k$-nearest neighbor (NN) graph in an $n$-vertex complete graph, whose edge weights are independent and distributed according to $P_n$ for edges in the hidden $2k$-NN graph and $Q_n$ otherwise.
2 code implementations • CVPR 2019 • Jian Ding, Nan Xue, Yang Long, Gui-Song Xia, Qikai Lu
Object detection in aerial images is an active yet challenging task in computer vision because of the bird's-eye view perspective, the highly complex backgrounds, and the variant appearances of objects.
1 code implementation • 1 Dec 2018 • Jian Ding, Nan Xue, Yang Long, Gui-Song Xia, Qikai Lu
Especially when detecting densely packed objects in aerial images, methods relying on horizontal proposals for common object detection often introduce mismatches between the Region of Interests (RoIs) and objects.
Ranked #54 on
Object Detection In Aerial Images
on DOTA
(using extra training data)
1 code implementation • 19 Nov 2018 • Jian Ding, Zongming Ma, Yihong Wu, Jiaming Xu
This work develops an $\tilde{O}(n d^2+n^2)$-time algorithm which perfectly recovers the true vertex correspondence with high probability, provided that the average degree is at least $d = \Omega(\log^2 n)$ and the two graphs differ by at most $\delta = O( \log^{-2}(n) )$ fraction of edges.
no code implementations • 15 Apr 2018 • Vivek Bagaria, Jian Ding, David Tse, Yihong Wu, Jiaming Xu
Represented as bicolored multi-graphs, these extreme points are further decomposed into simpler "blossom-type" structures for the large deviation analysis and counting arguments.
6 code implementations • CVPR 2018 • Gui-Song Xia, Xiang Bai, Jian Ding, Zhen Zhu, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, Liangpei Zhang
The fully annotated DOTA images contains $188, 282$ instances, each of which is labeled by an arbitrary (8 d. o. f.)
Ranked #58 on
Object Detection In Aerial Images
on DOTA
(using extra training data)
no code implementations • 18 May 2014 • Ofer Dekel, Jian Ding, Tomer Koren, Yuval Peres
This class includes problems where the algorithm's loss is the minimum over the recent adversarial values, the maximum over the recent values, or a linear combination of the recent values.
no code implementations • 11 Oct 2013 • Ofer Dekel, Jian Ding, Tomer Koren, Yuval Peres
We prove that the player's $T$-round minimax regret in this setting is $\widetilde{\Theta}(T^{2/3})$, thereby closing a fundamental gap in our understanding of learning with bandit feedback.