2 code implementations • 29 Sep 2024 • Haolin Li, YuHang Zhou, Ziheng Zhao, Siyuan Du, Jiangchao Yao, Weidi Xie, Ya zhang, Yanfeng Wang
To accomplish the above objective, we propose a novel framework named Low-Rank Knowledge Decomposition (LoRKD), which explicitly separates gradients from different tasks by incorporating low-rank expert modules and efficient knowledge separation convolution.
3D Medical Imaging Segmentation Medical Image Classification
no code implementations • 26 Aug 2024 • Qirui Chen, Shangzhe Di, Weidi Xie
Trained on our visual instruction data, GeLM demonstrates improved multi-hop grounding and reasoning capabilities, setting a new baseline for this challenging task.
no code implementations • 24 Aug 2024 • Görkay Aydemir, Weidi Xie, Fatma Güney
Large-scale vision foundation models have demonstrated remarkable success across various tasks, underscoring their robust generalization capabilities.
1 code implementation • 22 Aug 2024 • Chaoyi Wu, Pengcheng Qiu, Jinxin Liu, Hongfei Gu, Na Li, Ya zhang, Yanfeng Wang, Weidi Xie
To promote further advancements in the application of LLMs to clinical challenges, we have made the MedS-Ins dataset fully accessible and invite the research community to contribute to its expansion. Additionally, we have launched a dynamic leaderboard for MedS-Bench, which we plan to regularly update the test set to track progress and enhance the adaptation of general LLMs to the medical domain.
no code implementations • 23 Jul 2024 • Jiayu Lei, Xiaoman Zhang, Chaoyi Wu, Lisong Dai, Ya zhang, Yanyong Zhang, Yanfeng Wang, Weidi Xie, Yuehua Li
To address these challenges, we initiate a series of work on grounded Automatic Report Generation (AutoRG), starting from the brain MRI interpretation system, which supports the delineation of brain structures, the localization of anomalies, and the generation of well-organized findings.
1 code implementation • 22 Jul 2024 • Junyu Xie, Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
Our objective is to generate Audio Descriptions (ADs) for both movies and TV series in a training-free manner.
no code implementations • 17 Jul 2024 • Yibin Yan, Weidi Xie
Knowledge-based Visual Question Answering (KVQA) tasks require answering questions about images using extensive background knowledge.
2 code implementations • 16 Jul 2024 • Cilin Yan, Haochen Wang, Shilin Yan, XiaoLong Jiang, Yao Hu, Guoliang Kang, Weidi Xie, Efstratios Gavves
In this paper, we introduce a new task, Reasoning Video Object Segmentation (ReasonVOS).
1 code implementation • 27 Jun 2024 • Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, XiaoLong Jiang, Yao Hu, Weidi Xie
This effectively enables the model to discern AI-generated images based on semantics or contextual information; Secondly, we select the highest frequency patches and the lowest frequency patches in the image, and compute the low-level patchwise features, aiming to detect AI-generated images by low-level artifacts, for example, noise pattern, anti-aliasing, etc.
1 code implementation • 26 Jun 2024 • Jiayuan Rao, HaoNing Wu, Chang Liu, Yanfeng Wang, Weidi Xie
Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience.
1 code implementation • 24 Jun 2024 • Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie
This paper introduces a novel, entity-aware metric, termed as Radiological Report (Text) Evaluation (RaTEScore), to assess the quality of medical reports generated by AI models.
no code implementations • 3 Jun 2024 • Tianyu Huang, Tao Zhou, Weidi Xie, Shuo Wang, Qi Dou, Yizhe Zhang
We employ rectified annotations to perform online learning, with the aim of improving the segmentation quality of SA on medical images.
no code implementations • 25 Apr 2024 • Charig Yang, Weidi Xie, Andrew Zisserman
We also introduce a transformer-based model for ordering of image sequences of arbitrary length with built-in attribution maps.
1 code implementation • 25 Apr 2024 • Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Jiayu Lei, Ya zhang, Yanfeng Wang, Weidi Xie
We believe that RadGenome-Chest CT can significantly advance the development of multimodal medical foundation models, by training to generate texts based on given segmentation regions, which is unattainable with previous relevant datasets.
no code implementations • 22 Apr 2024 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names.
1 code implementation • 18 Apr 2024 • Junyu Xie, Charig Yang, Weidi Xie, Andrew Zisserman
The objective of this paper is motion segmentation -- discovering and segmenting the moving objects in a video.
1 code implementation • 15 Apr 2024 • Xiao Zhou, Xiaoman Zhang, Chaoyi Wu, Ya zhang, Weidi Xie, Yanfeng Wang
In this paper, we consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources, along with the domain-specific knowledge in pathology.
1 code implementation • 21 Feb 2024 • Pengcheng Qiu, Chaoyi Wu, Xiaoman Zhang, Weixiong Lin, Haicheng Wang, Ya zhang, Yanfeng Wang, Weidi Xie
The development of open-source, multilingual medical language models can benefit a wide, linguistically diverse audience from different regions.
no code implementations • CVPR 2024 • Chengjian Feng, Yujie Zhong, Zequn Jie, Weidi Xie, Lin Ma
The grounding head is trained to align the text embedding of category names with the regional visual feature of the diffusion model, using supervision from an off-the-shelf object detector, and a novel self-training scheme on (novel) categories not covered by the detector.
2 code implementations • 29 Jan 2024 • Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman
Our objective is audio-visual synchronization with a focus on 'in-the-wild' videos, such as those on YouTube, where synchronization cues can be sparse.
1 code implementation • CVPR 2024 • Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie
Generative models have recently exhibited exceptional capabilities in text-to-image generation but still struggle to generate image sequences coherently.
no code implementations • CVPR 2024 • Jilan Xu, Yifei HUANG, Junlin Hou, Guo Chen, Yuejie Zhang, Rui Feng, Weidi Xie
In this paper, (1) we develop EgoInstructor, a retrieval-augmented multimodal captioning model that automatically retrieves semantically relevant third-person instructional videos to enhance the video captioning of egocentric videos.
no code implementations • CVPR 2024 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names.
1 code implementation • CVPR 2024 • Guanqi Zhan, Chuanxia Zheng, Weidi Xie, Andrew Zisserman
In contrast, we use 3D data to establish an automatic pipeline to determine authentic ground truth amodal masks for partially occluded objects in real images.
1 code implementation • 28 Dec 2023 • Ziheng Zhao, Yao Zhang, Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie
Our main contributions are three folds: (i) for dataset construction, we construct the first multi-modal knowledge tree on human anatomy, including 6502 anatomical terminologies; Then we build up the largest and most comprehensive segmentation dataset for training, by collecting over 22K 3D medical image scans from 72 segmentation datasets, across 497 classes, with careful standardization on both image scans and label space; (ii) for architecture design, we propose to inject medical knowledge into a text encoder via contrastive learning, and then formulate a universal segmentation model, that can be prompted by feeding in medical terminologies in text form; (iii) As a result, we have trained SAT-Nano (110M parameters) and SAT-Pro (447M parameters), demonstrating comparable performance to 72 specialist nnU-Nets trained on each dataset/subsets.
1 code implementation • 26 Dec 2023 • Qiaoyu Zheng, Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Lisong Dai, Hengyu Guan, Yuehua Li, Ya zhang, Yanfeng Wang, Weidi Xie
Developing a generalist radiology diagnosis system can greatly enhance clinical diagnostics.
no code implementations • 21 Dec 2023 • Zeqian Li, Qirui Chen, Tengda Han, Ya zhang, Yanfeng Wang, Weidi Xie
In this paper, we aim to establish an automatic, scalable pipeline for denoising the large-scale instructional dataset and construct a high-quality video-text dataset with multiple descriptive steps supervision, named HowToStep.
no code implementations • 18 Dec 2023 • Junyu Xie, Weidi Xie, Andrew Zisserman
The goal of this paper is to discover, segment, and track independently moving objects in complex visual scenes.
1 code implementation • CVPR 2024 • Shangzhe Di, Weidi Xie
Existing approaches to video understanding, mainly designed for short videos from a third-person perspective, are limited in their applicability in certain fields, such as robotics.
1 code implementation • 15 Oct 2023 • Chaoyi Wu, Jiayu Lei, Qiaoyu Zheng, Weike Zhao, Weixiong Lin, Xiaoman Zhang, Xiao Zhou, Ziheng Zhao, Ya zhang, Yanfeng Wang, Weidi Xie
Driven by the large foundation models, the development of artificial intelligence has witnessed tremendous progress lately, leading to a surge of general interest from the public.
1 code implementation • 10 Oct 2023 • Guanqi Zhan, Chuanxia Zheng, Weidi Xie, Andrew Zisserman
To this end, we make the following contributions: (i) We introduce a general and lightweight protocol to evaluate whether features of an off-the-shelf large vision model encode a number of physical 'properties' of the 3D scene, by training discriminative classifiers on the features for these properties.
no code implementations • 10 Oct 2023 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.
no code implementations • 20 Sep 2023 • Luoyi Sun, Xuenan Xu, Mengyue Wu, Weidi Xie
Recently, the AI community has made significant strides in developing powerful foundation models, driven by large-scale multimodal datasets.
1 code implementation • 13 Sep 2023 • Jiayu Lei, Lisong Dai, Haoyun Jiang, Chaoyi Wu, Xiaoman Zhang, Yao Zhang, Jiangchao Yao, Weidi Xie, Yanyong Zhang, Yuehua Li, Ya zhang, Yanfeng Wang
Magnetic resonance imaging~(MRI) have played a crucial role in brain disease diagnosis, with which a range of computer-aided artificial intelligence methods have been proposed.
no code implementations • ICCV 2023 • Hala Lamdouar, Weidi Xie, Andrew Zisserman
We also incorporate the proposed camouflage score into a generative model as an auxiliary loss and show that effective camouflage images or videos can be synthesised in a scalable manner.
1 code implementation • 16 Aug 2023 • Fangrui Zhu, Yiming Xie, Weidi Xie, Huaizu Jiang
To address this issue, in this paper, we introduce a diagnosis toolbox to provide detailed quantitative break-down analysis of HOI detection models, inspired by the success of object detection diagnosis toolboxes.
1 code implementation • ICCV 2023 • Qingyao Xu, Weibo Mao, Jingze Gong, Chenxin Xu, Siheng Chen, Weidi Xie, Ya zhang, Yanfeng Wang
Multi-person motion prediction is a challenging problem due to the dependency of motion on both individual past movements and interactions with other people.
1 code implementation • 4 Aug 2023 • Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie
In this study, we aim to initiate the development of Radiology Foundation Model, termed as RadFM.
1 code implementation • 24 Jun 2023 • HaoNing Wu, Xiaoyun Zhang, Weidi Xie, Ya zhang, Yanfeng Wang
Video frame interpolation (VFI) is a challenging task that aims to generate intermediate frames between two consecutive frames in a video.
1 code implementation • 13 Jun 2023 • Gyungin Shin, Weidi Xie, Samuel Albanie
In this paper, we propose to meet this challenge through the novel task of automatic table verification (AutoTV), in which the objective is to verify the accuracy of numerical data in tables by cross-referencing cited sources.
1 code implementation • 12 Jun 2023 • Yikun Liu, Jiangchao Yao, Ya zhang, Yanfeng Wang, Weidi Xie
In this paper, we consider the problem of composed image retrieval (CIR), it aims to train a model that can fuse multi-modal information, e. g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
Ranked #1 on Zero-Shot Composed Image Retrieval (ZS-CIR) on CIRR
no code implementations • 8 Jun 2023 • Prannay Kaul, Weidi Xie, Andrew Zisserman
The goal of this paper is open-vocabulary object detection (OVOD) $\unicode{x2013}$ building a model that can detect objects beyond the set of categories seen at training, thus enabling the user to specify categories of interest at inference without the need for model retraining.
1 code implementation • 1 Jun 2023 • Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie
Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently.
no code implementations • 18 May 2023 • Jinxiang Liu, Yu Wang, Chen Ju, Chaofan Ma, Ya zhang, Weidi Xie
The objective of Audio-Visual Segmentation (AVS) is to localise the sounding objects within visual scenes by accurately predicting pixel-wise segmentation masks.
2 code implementations • 17 May 2023 • Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Weixiong Lin, Ya zhang, Yanfeng Wang, Weidi Xie
Medical Visual Question Answering (MedVQA) presents a significant opportunity to enhance diagnostic accuracy and healthcare delivery by leveraging artificial intelligence to interpret and answer questions based on medical images.
Ranked #1 on Medical Visual Question Answering on PMC-VQA
1 code implementation • 27 Apr 2023 • Chaoyi Wu, Weixiong Lin, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie
Our contributions are threefold: (i) we systematically investigate the process of adapting a general-purpose foundation language model towards medical domain, this involves data-centric knowledge injection through the integration of 4. 8M biomedical academic papers and 30K medical textbooks, as well as comprehensive fine-tuning for alignment with domain-specific instructions; (ii) we contribute a large-scale, comprehensive dataset for instruction tuning.
1 code implementation • 27 Apr 2023 • Gyungin Shin, Samuel Albanie, Weidi Xie
Segmentation is a core computer vision competency, with applications spanning a broad range of scientifically and economically valuable domains.
1 code implementation • ICCV 2023 • Haochen Wang, Cilin Yan, Shuai Wang, XiaoLong Jiang, Xu Tang, Yao Hu, Weidi Xie, Efstratios Gavves
Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos.
1 code implementation • CVPR 2023 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
The objective of this paper is an automatic Audio Description (AD) model that ingests movies and outputs AD in text form.
1 code implementation • CVPR 2023 • Yue Hu, Yifan Lu, Runsheng Xu, Weidi Xie, Siheng Chen, Yanfeng Wang
Camera-only 3D detection provides an economical solution with a simple configuration for localizing objects in 3D space compared to LiDAR-based detection systems.
no code implementations • 21 Mar 2023 • Chen Ju, Zeqian Li, Peisen Zhao, Ya zhang, Xiaopeng Zhang, Qi Tian, Yanfeng Wang, Weidi Xie
In this paper, we consider the problem of temporal action localization under low-shot (zero-shot & few-shot) scenario, with the goal of detecting and classifying the action instances from arbitrary categories within some untrimmed videos, even not seen at training time.
1 code implementation • 13 Mar 2023 • Weixiong Lin, Ziheng Zhao, Xiaoman Zhang, Chaoyi Wu, Ya zhang, Yanfeng Wang, Weidi Xie
Foundation models trained on large-scale dataset gain a recent surge in CV and NLP.
Ranked #3 on Medical Visual Question Answering on PMC-VQA
1 code implementation • 27 Feb 2023 • Xiaoman Zhang, Chaoyi Wu, Ya zhang, Yanfeng Wang, Weidi Xie
While multi-modal foundation models pre-trained on large-scale data have been successful in natural language understanding and vision recognition, their use in medical domains is still limited due to the fine-grained nature of medical tasks and the high demand for domain knowledge.
no code implementations • 22 Feb 2023 • Chaoyi Wu, Xiaoman Zhang, Yanfeng Wang, Ya zhang, Weidi Xie
In this paper, we consider the problem of disease diagnosis.
1 code implementation • CVPR 2023 • Keyan Chen, XiaoLong Jiang, Yao Hu, Xu Tang, Yan Gao, Jianqi Chen, Weidi Xie
In this paper, we consider the problem of simultaneously detecting objects and inferring their visual attributes in an image, even for those with no manual annotations provided at the training stage, resembling an open-vocabulary scenario.
Ranked #1 on Open Vocabulary Attribute Detection on OVAD benchmark (using extra training data)
1 code implementation • CVPR 2023 • Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng, Yi Wang, Yu Qiao, Weidi Xie
The former aims to infer all masked entities in the caption given the group tokens, that enables the model to learn fine-grained alignment between visual groups and text entities.
Open Vocabulary Semantic Segmentation Open-Vocabulary Semantic Segmentation +1
1 code implementation • ICCV 2023 • Ziyi Li, Qinye Zhou, Xiaoyun Zhang, Ya zhang, Yanfeng Wang, Weidi Xie
The goal of this paper is to extract the visual-language correspondence from a pre-trained text-to-image diffusion model, in the form of segmentation map, i. e., simultaneously generating images and segmentation masks for the corresponding visual entities described in the text prompt.
no code implementations • 5 Jan 2023 • Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie
In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice.
no code implementations • ICCV 2023 • Tengda Han, Max Bain, Arsha Nagrani, Gul Varol, Weidi Xie, Andrew Zisserman
Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.
no code implementations • ICCV 2023 • Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie
In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice.
1 code implementation • 27 Oct 2022 • Chaofan Ma, Yuhuan Yang, Yanfeng Wang, Ya zhang, Weidi Xie
When trained at a sufficient scale, self-supervised learning has exhibited a notable ability to solve a wide range of visual or language understanding tasks.
1 code implementation • 18 Oct 2022 • Guanqi Zhan, Weidi Xie, Andrew Zisserman
To this end we make the following four contributions: (1) We propose a simple 'plugin' module for the detection head of two-stage object detectors to improve the recall of partially occluded objects.
Ranked #1 on Instance Segmentation on Separated COCO
2 code implementations • 13 Oct 2022 • Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman
This contrasts with the case of synchronising videos of talking heads, where audio-visual correspondence is dense in both time and space.
no code implementations • 10 Oct 2022 • Tengda Han, Weidi Xie, Andrew Zisserman
The objective of this paper is an efficient training method for video tasks.
no code implementations • 7 Oct 2022 • Qinye Zhou, Ziyi Li, Weidi Xie, Xiaoyun Zhang, Ya zhang, Yanfeng Wang
Existing models on super-resolution often specialized for one scale, fundamentally limiting their use in practical scenarios.
no code implementations • 1 Oct 2022 • Shuangrui Ding, Weidi Xie, Yabo Chen, Rui Qian, Xiaopeng Zhang, Hongkai Xiong, Qi Tian
In this paper, we consider the task of unsupervised object discovery in videos.
Ranked #3 on Unsupervised Object Segmentation on DAVIS 2016
1 code implementation • 22 Sep 2022 • Gyungin Shin, Weidi Xie, Samuel Albanie
Our method, termed NamedMask, begins by using CLIP to construct category-specific archives of images.
no code implementations • 12 Sep 2022 • Pak-Hei Yeung, Moska Aliasi, Monique Haak, the INTERGROWTH-21st Consortium, Weidi Xie, Ana I. L. Namburete
Two-dimensional (2D) freehand ultrasound is the mainstay in prenatal care and fetal growth monitoring.
1 code implementation • 29 Aug 2022 • Chang Liu, Yujie Zhong, Andrew Zisserman, Weidi Xie
In this paper, we consider the problem of generalised visual object counting, with the goal of developing a computational model for counting the number of objects from arbitrary semantic categories, using arbitrary number of "exemplars", i. e. zero-shot or few-shot counting.
Ranked #2 on Exemplar-Free Counting on FSC147
no code implementations • 20 Aug 2022 • Wentao Liu, Chaofan Ma, Yuhuan Yang, Weidi Xie, Ya zhang
The goal of this paper is to interactively refine the automatic segmentation on challenging structures that fall behind human performance, either due to the scarcity of available annotations or the difficulty nature of the problem itself, for example, on segmenting cancer or small organs.
no code implementations • 8 Aug 2022 • Yue Hu, Shaoheng Fang, Weidi Xie, Siheng Chen
To fill the gap, this work proposes a dual-view detection system named DVDET to achieve aerial monocular object detection in both the 2D image space and the 3D physical space.
1 code implementation • 5 Jul 2022 • Junyu Xie, Weidi Xie, Andrew Zisserman
The objective of this paper is a model that is able to discover, track and segment multiple moving objects in a video.
Ranked #3 on Unsupervised Object Segmentation on FBMS-59
no code implementations • 26 Jun 2022 • Jinxiang Liu, Chen Ju, Weidi Xie, Ya zhang
We present a simple yet effective self-supervised framework for audio-visual representation learning, to localize the sound source in videos.
1 code implementation • 14 Jun 2022 • Ziheng Zhao, Tianjiao Zhang, Weidi Xie, Yanfeng Wang, Ya zhang
This paper considers the problem of undersampled MRI reconstruction.
2 code implementations • 14 Jun 2022 • Gyungin Shin, Weidi Xie, Samuel Albanie
Semantic segmentation has a broad range of applications, but its real-world impact has been significantly limited by the prohibitive annotation costs necessary to enable deployment.
1 code implementation • CVPR 2022 • Tengda Han, Weidi Xie, Andrew Zisserman
The objective of this paper is a temporal alignment network that ingests long term video sequences, and associated text sentences, in order to: (1) determine if a sentence is alignable with the video; and (2) if it is alignable, then determine its alignment.
2 code implementations • 30 Mar 2022 • Chengjian Feng, Yujie Zhong, Zequn Jie, Xiangxiang Chu, Haibing Ren, Xiaolin Wei, Weidi Xie, Lin Ma
The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations.
1 code implementation • 23 Mar 2022 • Gyungin Shin, Samuel Albanie, Weidi Xie
In this paper, we tackle the challenging task of unsupervised salient object detection (SOD) by leveraging spectral clustering on self-supervised features.
Ranked #1 on Unsupervised Saliency Detection on ECSSD
1 code implementation • CVPR 2022 • Prannay Kaul, Weidi Xie, Andrew Zisserman
The objective of this paper is few-shot object detection (FSOD) -- the task of expanding an object detector for a new category given only a few instances for training.
no code implementations • 8 Dec 2021 • Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman
Finally, we set the first benchmark for general audio-visual synchronisation with over 160 diverse classes in the new VGG-Sound Sync video dataset.
1 code implementation • 8 Dec 2021 • Chen Ju, Tengda Han, Kunhao Zheng, Ya zhang, Weidi Xie
Image-based visual-language (I-VL) pre-training has shown great success for learning joint visual-textual representations from large-scale web data, revealing remarkable ability for zero-shot generalisation.
Ranked #5 on Zero-Shot Action Detection on ActivityNet-1.3
no code implementations • CVPR 2022 • Charig Yang, Weidi Xie, Andrew Zisserman
In this paper, we present a framework for reading analog clocks in natural images or videos.
no code implementations • 24 Sep 2021 • Pak-Hei Yeung, Linde Hesse, Moska Aliasi, Monique Haak, the INTERGROWTH-21st Consortium, Weidi Xie, Ana I. L. Namburete
The objective of this work is to achieve sensorless reconstruction of a 3D volume from a set of 2D freehand ultrasound images with deep implicit representation.
no code implementations • 7 Sep 2021 • Xiaoman Zhang, Weidi Xie, Chaoqin Huang, Yanfeng Wang, Ya zhang, Xin Chen, Qi Tian
In this paper, we target self-supervised representation learning for zero-shot tumor segmentation.
1 code implementation • 26 May 2021 • Pak-Hei Yeung, Ana I. L. Namburete, Weidi Xie
The objective of this work is to segment any arbitrary structures of interest (SOI) in 3D volumes by only annotating a single slice, (i. e. semi-automatic 3D segmentation).
no code implementations • ICCV 2021 • Charig Yang, Hala Lamdouar, Erika Lu, Andrew Zisserman, Weidi Xie
We additionally evaluate on a challenging camouflage dataset (MoCA), significantly outperforming the other self-supervised approaches, and comparing favourably to the top supervised approach, highlighting the importance of motion cues, and the potential bias towards visual appearance in existing video segmentation models.
Ranked #7 on Unsupervised Object Segmentation on DAVIS 2016
2 code implementations • 13 Apr 2021 • Gyungin Shin, Weidi Xie, Samuel Albanie
A central challenge for the task of semantic segmentation is the prohibitive cost of obtaining dense pixel-level annotations to supervise model training.
1 code implementation • CVPR 2021 • Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman
We show that our algorithm achieves state-of-the-art performance on the popular Flickr SoundNet dataset.
2 code implementations • 26 Mar 2021 • Ben Jaderberg, Lewis W. Anderson, Weidi Xie, Samuel Albanie, Martin Kiffner, Dieter Jaksch
The resurgence of self-supervised learning, whereby a deep learning model generates its own supervisory signal from the data, promises a scalable way to tackle the dramatically increasing size of real-world data sets without human annotation.
5 code implementations • 14 Feb 2021 • ZiRui Wang, Shangzhe Wu, Weidi Xie, Min Chen, Victor Adrian Prisacariu
Considering the problem of novel view synthesis (NVS) from only a set of 2D images, we simplify the training process of Neural Radiance Field (NeRF) on forward-facing scenes by removing the requirement of known or pre-computed camera parameters, including both intrinsics and 6DoF poses.
no code implementations • 12 Dec 2020 • Arsha Nagrani, Joon Son Chung, Jaesung Huh, Andrew Brown, Ernesto Coto, Weidi Xie, Mitchell McLaren, Douglas A Reynolds, Andrew Zisserman
We held the second installment of the VoxCeleb Speaker Recognition Challenge in conjunction with Interspeech 2020.
no code implementations • 23 Nov 2020 • Hala Lamdouar, Charig Yang, Weidi Xie, Andrew Zisserman
We make the following three contributions: (i) We propose a novel architecture that consists of two essential components for breaking camouflage, namely, a differentiable registration module to align consecutive frames based on the background, which effectively emphasises the object boundary in the difference image, and a motion segmentation module with memory that discovers the moving objects, while maintaining the object permanence even when motion is absent at some point.
1 code implementation • NeurIPS 2020 • Tengda Han, Weidi Xie, Andrew Zisserman
The objective of this paper is visual-only self-supervised video representation learning.
Ranked #8 on Self-Supervised Action Recognition Linear on HMDB51
1 code implementation • 16 Sep 2020 • Erika Lu, Forrester Cole, Tali Dekel, Weidi Xie, Andrew Zisserman, David Salesin, William T. Freeman, Michael Rubinstein
We present a method for retiming people in an ordinary, natural video -- manipulating and editing the time in which different motions of individuals in the video occur.
no code implementations • 1 Sep 2020 • Weidi Xie, Jeffrey Byrne, Andrew Zisserman
We describe three use cases on the public IJB-C face verification benchmark: (i) to improve 1:1 image-based verification error rates by rejecting low-quality face images; (ii) to improve quality score based fusion performance on the 1:1 set-based verification benchmark; and (iii) its use as a quality measure for selecting high quality (unblurred, good lighting, more frontal) faces from a collection, e. g. for automatic enrolment or display.
1 code implementation • ECCV 2020 • Tengda Han, Weidi Xie, Andrew Zisserman
The objective of this paper is self-supervised learning from video, in particular for representations for action recognition.
2 code implementations • ECCV 2020 • Andrew Brown, Weidi Xie, Vicky Kalogeiton, Andrew Zisserman
Optimising a ranking-based metric, such as Average Precision (AP), is notoriously challenging due to the fact that it is non-differentiable, and hence cannot be optimised directly using gradient-descent methods.
Ranked #4 on Vehicle Re-Identification on VehicleID Medium
no code implementations • 22 Jun 2020 • Fangrui Zhu, Li Zhang, Yanwei Fu, Guodong Guo, Weidi Xie
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a. k. a.
3 code implementations • 29 Apr 2020 • Honglie Chen, Weidi Xie, Andrea Vedaldi, Andrew Zisserman
Our goal is to collect a large-scale audio-visual dataset with low label noise from videos in the wild using computer vision techniques.
2 code implementations • CVPR 2020 • Zihang Lai, Erika Lu, Weidi Xie
Recent interest in self-supervised dense tracking has yielded rapid progress, but performance still remains far from supervised methods.
Ranked #4 on Unsupervised Video Object Segmentation on DAVIS 2017 (val) (using extra training data)
Semantic Segmentation Semi-Supervised Video Object Segmentation +2
no code implementations • 5 Dec 2019 • Joon Son Chung, Arsha Nagrani, Ernesto Coto, Weidi Xie, Mitchell McLaren, Douglas A. Reynolds, Andrew Zisserman
The VoxCeleb Speaker Recognition Challenge 2019 aimed to assess how well current speaker recognition technology is able to identify speakers in unconstrained or `in the wild' data.
1 code implementation • 10 Sep 2019 • Tengda Han, Weidi Xie, Andrew Zisserman
The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition.
Ranked #10 on Self-Supervised Action Recognition Linear on UCF101
Representation Learning Self-Supervised Action Recognition +3
no code implementations • 6 Sep 2019 • Dan Xu, Weidi Xie, Andrew Zisserman
In this paper we propose a geometry-aware model for video object detection.
no code implementations • 14 Aug 2019 • Honglie Chen, Weidi Xie, Andrea Vedaldi, Andrew Zisserman
We propose AutoCorrect, a method to automatically learn object-annotation alignments from a dataset with annotations affected by geometric noise.
1 code implementation • 2 May 2019 • Zihang Lai, Weidi Xie
Fourth, in order to shed light on the potential of self-supervised learning on the task of video correspondence flow, we probe the upper bound by training on additional data, \ie more diverse videos, further demonstrating significant improvements on video segmentation.
Self-Supervised Learning Semi-Supervised Video Object Segmentation +4
10 code implementations • 26 Feb 2019 • Weidi Xie, Arsha Nagrani, Joon Son Chung, Andrew Zisserman
The objective of this paper is speaker recognition "in the wild"-where utterances may be of variable length and also contain irrelevant signals.
1 code implementation • 1 Nov 2018 • Erika Lu, Weidi Xie, Andrew Zisserman
The model achieves competitive performance on cell and crowd counting datasets, and surpasses the state-of-the-art on the car dataset using only three training images.
no code implementations • ECCV 2018 • Weidi Xie, Li Shen, Andrew Zisserman
Our contributions are: (i) We propose a Deep Comparator Network (DCN) that can ingest a pair of sets (each may contain a variable number of images) as inputs, and compute a similarity between the pair--this involves attending to multiple discriminative local regions (landmarks), and comparing local descriptors between pairs of faces; (ii) To encourage high-quality representations for each set, internal competition is introduced for recalibration based on the landmark score; (iii) Inspired by image retrieval, a novel hard sample mining regime is proposed to control the sampling process, such that the DCN is complementary to the standard image classification models.
1 code implementation • 24 Jul 2018 • Weidi Xie, Andrew Zisserman
In this paper, we design a neural network architecture that learns to aggregate based on both "visual" quality (resolution, illumination), and "content" quality (relative importance for discriminative classification).
Ranked #1 on Face Recognition on BTS3.1
no code implementations • 3 Nov 2017 • Davis M. Vigneault, Weidi Xie, Carolyn Y. Ho, David A. Bluemke, J. Alison Noble
Pixelwise segmentation of the left ventricular (LV) myocardium and the four cardiac chambers in 2-D steady state free precession (SSFP) cine sequences is an essential preprocessing step for a wide range of analyses.
23 code implementations • 23 Oct 2017 • Qiong Cao, Li Shen, Weidi Xie, Omkar M. Parkhi, Andrew Zisserman
The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimize the label noise.
Ranked #1 on Face Verification on IJB-C (training dataset metric)
no code implementations • 17 Jul 2017 • Yipeng Hu, Eli Gibson, Li-Lin Lee, Weidi Xie, Dean C. Barratt, Tom Vercauteren, J. Alison Noble
Sonography synthesis has a wide range of applications, including medical procedure simulation, clinical training and multimodality image registration.
no code implementations • 12 Apr 2017 • Davis M. Vigneault, Weidi Xie, David A. Bluemke, J. Alison Noble
Feature tracking Cardiac Magnetic Resonance (CMR) has recently emerged as an area of interest for quantification of regional cardiac function from balanced, steady state free precession (SSFP) cine sequences.