2 code implementations • ICCV 2021 • Xinru Chen, Chengbo Dong, Jiaqi Ji, Juan Cao, Xirong Li
The key challenge of image manipulation detection is how to learn generalizable features that are sensitive to manipulations in novel data, whilst specific to prevent false alarms on authentic images.
Ranked #4 on Image Manipulation Localization on COVERAGE
2 code implementations • 16 Dec 2021 • Chengbo Dong, Xinru Chen, Ruohan Hu, Juan Cao, Xirong Li
As both clues are meant to be semantic-agnostic, the learned features are thus generalizable.
2 code implementations • 22 May 2018 • Xirong Li, Chaoxi Xu, Xiaoxu Wang, Weiyu Lan, Zhengxiong Jia, Gang Yang, Jieping Xu
This paper contributes to cross-lingual image annotation and retrieval in terms of data and baseline methods.
1 code implementation • CVPR 2019 • Jianfeng Dong, Xirong Li, Chaoxi Xu, Shouling Ji, Yuan He, Gang Yang, Xun Wang
This paper attacks the challenging problem of zero-example video retrieval.
1 code implementation • 10 Sep 2020 • Jianfeng Dong, Xirong Li, Chaoxi Xu, Xun Yang, Gang Yang, Xun Wang, Meng Wang
In this paper we achieve this by proposing a dual deep encoding network that encodes videos and queries into powerful dense representations of their own.
Ranked #3 on Ad-hoc video search on TRECVID-AVS16 (IACC.3) (using extra training data)
1 code implementation • 5 Mar 2019 • Xueyao Zhang, Juan Cao, Xirong Li, Qiang Sheng, Lei Zhong, Kai Shu
Emotion plays an important role in detecting fake news online.
1 code implementation • 5 Sep 2017 • Jianfeng Dong, Xirong Li, Cees G. M. Snoek
This paper strives to find amidst a set of sentences the one best describing the content of a given image or video.
1 code implementation • 28 Mar 2015 • Xirong Li, Tiberio Uricchio, Lamberto Ballan, Marco Bertini, Cees G. M. Snoek, Alberto del Bimbo
Where previous reviews on content-based image retrieval emphasize on what can be seen in an image to bridge the semantic gap, this survey considers what people tag about an image.
1 code implementation • 28 Feb 2022 • Tianyun Yang, Ziyao Huang, Juan Cao, Lei LI, Xirong Li
With the rapid progress of generation technology, it has become necessary to attribute the origin of fake images.
1 code implementation • 16 Jul 2022 • Jiazhen Liu, Xirong Li, Qijie Wei, Jie Xu, Dayong Ding
To attack the incompleteness of manual labeling, we propose Progressive Keypoint Expansion to enrich the keypoint labels at each training epoch.
Ranked #2 on Image Registration on FIRE
1 code implementation • 26 Aug 2022 • Jianfeng Dong, Xianke Chen, Minsong Zhang, Xun Yang, ShuJie Chen, Xirong Li, Xun Wang
To fill the gap, we propose in this paper a novel T2VR subtask termed Partially Relevant Video Retrieval (PRVR).
Ranked #1 on Partially Relevant Video Retrieval on TVR
1 code implementation • 21 Jun 2021 • Rui Qian, Xin Lai, Xirong Li
Autonomous driving is regarded as one of the most promising remedies to shield human beings from severe crashes.
1 code implementation • 3 Dec 2021 • Fan Hu, Aozhu Chen, Ziyue Wang, Fangming Zhou, Jianfeng Dong, Xirong Li
In this paper we revisit feature fusion, an old-fashioned topic, in the new context of text-to-video retrieval.
Ranked #1 on Ad-hoc video search on TRECVID-AVS20 (V3C1) (using extra training data)
1 code implementation • 15 Aug 2017 • Weiyu Lan, Xirong Li, Jianfeng Dong
The framework comprises a module to automatically estimate the fluency of the sentences and another module to utilize the estimated fluency scores to effectively train an image captioning model for the target language.
1 code implementation • 21 Apr 2021 • Rui Qian, Xin Lai, Xirong Li
Specifically, instead of refining each proposal independently as previous works do, we represent each proposal as a node for graph construction within a given cut-off threshold, associating proposals in the form of local neighborhood graph, with boundary correlations of an object being explicitly exploited.
1 code implementation • 8 Apr 2020 • Jianfeng Dong, Xun Wang, Leimin Zhang, Chaoxi Xu, Gang Yang, Xirong Li
Predicting the relevance between two given videos with respect to their visual content is a key component for content-based video recommendation and retrieval.
1 code implementation • 23 Jan 2022 • Jianfeng Dong, Yabing Wang, Xianke Chen, Xiaoye Qu, Xirong Li, Yuan He, Xun Wang
In this work, we concentrate on video representation learning, an essential component for text-to-video retrieval.
2 code implementations • 23 May 2017 • Bin Liang, Hongcheng Li, Miaoqiang Su, Xirong Li, Wenchang Shi, Xiao-Feng Wang
Consequently, the adversarial example can be effectively detected by comparing the classification results of a given sample and its denoised version, without referring to any prior knowledge of attacks.
1 code implementation • ACL 2021 • Qiang Sheng, Juan Cao, Xueyao Zhang, Xirong Li, Lei Zhong
By fusing event and pattern information, we select key sentences to represent an article and then predict if the article fact-checks the given claim using the claim, key sentences, and patterns.
1 code implementation • 25 Dec 2019 • Qijie Wei, Xirong Li, Weihong Yu, Xiao Zhang, Yongpeng Zhang, Bojie Hu, Bin Mo, Di Gong, Ning Chen, Dayong Ding, Youxin Chen
This paper attacks the three challenges in the context of diabetic retinopathy (DR) grading.
1 code implementation • 28 Jul 2019 • Weisen Wang, Zhiyan Xu, Weihong Yu, Jianchun Zhao, Jingyuan Yang, Feng He, Zhikun Yang, Di Chen, Dayong Ding, Youxin Chen, Xirong Li
The CNN's fusion layer is tailored to the need of fusing information from the fundus and OCT streams.
1 code implementation • 3 Dec 2020 • Weisen Wang, Xirong Li, Zhiyan Xu, Weihong Yu, Jianchun Zhao, Dayong Ding, Youxin Chen
Our MM-CNN is instantiated by a two-stream CNN, with spatially-invariant fusion to combine information from the CFP and OCT streams.
1 code implementation • ICCV 2023 • Jiazhen Liu, Xirong Li
Such a homography allows us to compute cross-attention in a focused manner, where key/value sets required by Transformers can be reduced to small fix-sized regions rather than an entire image.
1 code implementation • 5 Sep 2017 • Jianfeng Dong, Xirong Li, Duanqing Xu
To quantify the current progress, we propose a simple text2image method, representing a novel test query by a set of images selected from large-scale query log.
1 code implementation • 17 Mar 2022 • Guang Yang, Juan Cao, Qiang Sheng, Peng Qi, Xirong Li, Jintao Li
However, these methods have two limitations: 1) they neglect other important elements like scenes, textures, and objects beyond the capacity of pretrained object detectors; 2) the correlation among objects is fixed, but a fixed correlation is not appropriate for all the images.
1 code implementation • 24 Nov 2020 • Xirong Li, Fangming Zhou, Chaoxi Xu, Jiaqi Ji, Gang Yang
Inspired by the initial success of previously few works in combining multiple sentence encoders, this paper takes a step forward by developing a new and general method for effectively exploiting diverse sentence encoders.
Ranked #2 on Ad-hoc video search on TRECVID-AVS16 (IACC.3) (using extra training data)
1 code implementation • 30 Apr 2022 • Ziyue Wang, Aozhu Chen, Fan Hu, Xirong Li
We propose a learning based method for training a negation-aware video retrieval model.
2 code implementations • 1 Apr 2021 • Jie Wang, Kaibin Tian, Dayong Ding, Gang Yang, Xirong Li
In this paper we extend UDA by proposing a new task called unsupervised domain expansion (UDE), which aims to adapt a deep model for the target domain with its unlabeled data, meanwhile maintaining the model's performance on the source domain.
Ranked #1 on Unsupervised Domain Expansion on UDE-DomainNet
1 code implementation • 4 Apr 2022 • Kaibin Tian, Qijie Wei, Xirong Li
Such sorts of samples are typically in minority in their host domain, so they tend to be overlooked by the domain-specific model, but can be better handled by a model from the other domain.
1 code implementation • 14 May 2023 • Qijie Wei, Jingyuan Yang, Bo wang, Jinrui Wang, Jianchun Zhao, Xinyu Zhao, Sheng Yang, Niranchana Manivannan, Youxin Chen, Dayong Ding, Jing Zhou, Xirong Li
This paper addresses the emerging task of recognizing multiple retinal diseases from wide-field (WF) and ultra-wide-field (UWF) fundus images.
no code implementations • 26 Apr 2017 • Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, Wenchang Shi
In this paper, we present an effective method to craft text adversarial samples, revealing one important yet underestimated fact that DNN-based text classifiers are also prone to adversarial sample attack.
no code implementations • 23 Apr 2016 • Jianfeng Dong, Xirong Li, Cees G. M. Snoek
This paper strives to find the sentence best describing the content of an image or video.
no code implementations • 3 May 2016 • Xirong Li, Qin Jin
This paper describes our winning entry in the ImageCLEF 2015 image sentence generation task.
no code implementations • 27 Apr 2016 • Xirong Li, Yujia Huo, Jieping Xu, Qin Jin
We enrich the MediaEval 2015 violence dataset by \emph{manually} labeling violence videos with respect to the subclasses.
no code implementations • 10 Oct 2015 • Masoud Mazloom, Xirong Li, Cees G. M. Snoek
We consider the problem of event detection in video for scenarios where only few, or even zero examples are available for training.
no code implementations • 13 Oct 2014 • Xirong Li
Due to the subjective nature of social tagging, measuring the relevance of social tags with respect to the visual content is crucial for retrieving the increasing amounts of social-networked images.
no code implementations • 17 Sep 2014 • Xixi He, Xirong Li, Gang Yang, Jieping Xu, Qin Jin
The key insight is to divide the vocabulary into two disjoint subsets, namely a seen set consisting of tags having ground truth available for optimizing their thresholds and a novel set consisting of tags without any ground truth.
no code implementations • 29 Oct 2018 • Gang Yang, Jinlu Liu, Xirong Li
Different from these existing types of methods, we propose a new method: sample construction to deal with the problem of ZSL.
no code implementations • 20 Nov 2019 • Fei Ding, Gang Yang, Jinlu Liu, Jun Wu, Dayong Ding, Jie Xv, Gangwei Cheng, Xirong Li
Unlike previous self-attention based methods that capture context information from one level, we reformulate the self-attention mechanism from the view of the high-order graph and propose a novel method, namely Hierarchical Attention Network (HANet), to address the problem of medical image segmentation.
no code implementations • 31 Jan 2020 • Zhengxiong Jia, Xirong Li
In this paper we study a brand new topic of interactive image captioning with human in the loop.
no code implementations • 9 Dec 2020 • Aozhu Chen, Xinyi Huang, Hailan Lin, Xirong Li
For the first scenario with the references available, we propose two metrics, i. e., WMDRel and CLinRel.
no code implementations • 16 Jun 2021 • Tianyun Yang, Juan Cao, Qiang Sheng, Lei LI, Jiaqi Ji, Xirong Li, Sheng Tang
Adopting a multi-task framework, we propose a GAN Fingerprint Disentangling Network (GFD-Net) to simultaneously disentangle the fingerprint from GAN-generated images and produce a content-irrelevant representation for fake image attribution.
no code implementations • 25 Sep 2021 • Xirong Li, Yang Zhou, Jie Wang, Hailan Lin, Jianchun Zhao, Dayong Ding, Weihong Yu, Youxin Chen
We propose in this paper Multi-Modal Multi-Instance Learning (MM-MIL) for selectively fusing CFP and OCT modalities.
no code implementations • 24 Apr 2022 • Yue Wu, Yang Zhou, Jianchun Zhao, Jingyuan Yang, Weihong Yu, Youxin Chen, Xirong Li
Over 300 million people worldwide are affected by various retinal diseases.
no code implementations • 28 Nov 2022 • Xirong Li, Aozhu Chen, Ziyue Wang, Fan Hu, Kaibin Tian, Xinru Chen, Chengbo Dong
The 2022 edition of the TRECVID benchmark has again been a fruitful participation for the RUCMM team.
no code implementations • 2 Aug 2023 • Kaibin Tian, Ruixiang Zhao, Hu Hu, Runquan Xie, Fengzong Lian, Zhanhui Kang, Xirong Li
For efficient T2VR, we propose TeachCLIP with multi-grained teaching to let a CLIP4Clip based student network learn from more advanced yet computationally heavy models such as X-CLIP, TS2-Net and X-Pool .
no code implementations • 27 Aug 2023 • Jing Zhou, Xiaotong Fu, Xirong Li, Wei Feng, Zhang Zhang, Ying Ji
The most common type of lung cancer, lung adenocarcinoma (LUAD), has been increasingly detected since the advent of low-dose computed tomography screening technology.
no code implementations • ICCV 2023 • Zhihao Sun, Haoran Jiang, Danding Wang, Xirong Li, Juan Cao
Since image editing methods in real world scenarios cannot be exhausted, generalization is a core challenge for image manipulation detection, which could be severely weakened by semantically related features.
no code implementations • 17 Mar 2024 • Jiazhen Liu, Yuhan Fu, Ruobing Xie, Runquan Xie, Xingwu Sun, Fengzong Lian, Zhanhui Kang, Xirong Li
The rapid growth of Large Language Models (LLMs) has driven the development of Large Vision-Language Models (LVLMs).