Search Results for author: Weidi Xie

Found 113 papers, 68 papers with code

LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models

2 code implementations29 Sep 2024 Haolin Li, YuHang Zhou, Ziheng Zhao, Siyuan Du, Jiangchao Yao, Weidi Xie, Ya zhang, Yanfeng Wang

To accomplish the above objective, we propose a novel framework named Low-Rank Knowledge Decomposition (LoRKD), which explicitly separates gradients from different tasks by incorporating low-rank expert modules and efficient knowledge separation convolution.

3D Medical Imaging Segmentation Medical Image Classification

Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos

no code implementations26 Aug 2024 Qirui Chen, Shangzhe Di, Weidi Xie

Trained on our visual instruction data, GeLM demonstrates improved multi-hop grounding and reasoning capabilities, setting a new baseline for this challenging task.

Language Modelling Large Language Model +3

Can Visual Foundation Models Achieve Long-term Point Tracking?

no code implementations24 Aug 2024 Görkay Aydemir, Weidi Xie, Fatma Güney

Large-scale vision foundation models have demonstrated remarkable success across various tasks, underscoring their robust generalization capabilities.

Point Tracking

Towards Evaluating and Building Versatile Large Language Models for Medicine

1 code implementation22 Aug 2024 Chaoyi Wu, Pengcheng Qiu, Jinxin Liu, Hongfei Gu, Na Li, Ya zhang, Yanfeng Wang, Weidi Xie

To promote further advancements in the application of LLMs to clinical challenges, we have made the MedS-Ins dataset fully accessible and invite the research community to contribute to its expansion. Additionally, we have launched a dynamic leaderboard for MedS-Bench, which we plan to regularly update the test set to track progress and enhance the adaptation of general LLMs to the medical domain.

Multiple-choice named-entity-recognition +2

AutoRG-Brain: Grounded Report Generation for Brain MRI

no code implementations23 Jul 2024 Jiayu Lei, Xiaoman Zhang, Chaoyi Wu, Lisong Dai, Ya zhang, Yanyong Zhang, Yanfeng Wang, Weidi Xie, Yuehua Li

To address these challenges, we initiate a series of work on grounded Automatic Report Generation (AutoRG), starting from the brain MRI interpretation system, which supports the delineation of brain structures, the localization of anomalies, and the generation of well-organized findings.

Anomaly Localization

AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description

1 code implementation22 Jul 2024 Junyu Xie, Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Our objective is to generate Audio Descriptions (ADs) for both movies and TV series in a training-free manner.

Sentence

EchoSight: Advancing Visual-Language Models with Wiki Knowledge

no code implementations17 Jul 2024 Yibin Yan, Weidi Xie

Knowledge-based Visual Question Answering (KVQA) tasks require answering questions about images using extensive background knowledge.

Question Answering RAG +2

A Sanity Check for AI-generated Image Detection

1 code implementation27 Jun 2024 Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, XiaoLong Jiang, Yao Hu, Weidi Xie

This effectively enables the model to discern AI-generated images based on semantics or contextual information; Secondly, we select the highest frequency patches and the lowest frequency patches in the image, and compute the low-level patchwise features, aiming to detect AI-generated images by low-level artifacts, for example, noise pattern, anti-aliasing, etc.

MatchTime: Towards Automatic Soccer Game Commentary Generation

1 code implementation26 Jun 2024 Jiayuan Rao, HaoNing Wu, Chang Liu, Yanfeng Wang, Weidi Xie

Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience.

RaTEScore: A Metric for Radiology Report Generation

1 code implementation24 Jun 2024 Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

This paper introduces a novel, entity-aware metric, termed as Radiological Report (Text) Evaluation (RaTEScore), to assess the quality of medical reports generated by AI models.

Entity Embeddings Language Modelling +2

Made to Order: Discovering monotonic temporal changes via self-supervised video ordering

no code implementations25 Apr 2024 Charig Yang, Weidi Xie, Andrew Zisserman

We also introduce a transformer-based model for ordering of image sequences of arbitrary length with built-in attribution maps.

RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis

1 code implementation25 Apr 2024 Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Jiayu Lei, Ya zhang, Yanfeng Wang, Weidi Xie

We believe that RadGenome-Chest CT can significantly advance the development of multimodal medical foundation models, by training to generate texts based on given segmentation regions, which is unattainable with previous relevant datasets.

Segmentation Sentence +2

AutoAD III: The Prequel -- Back to the Pixels

no code implementations22 Apr 2024 Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names.

Moving Object Segmentation: All You Need Is SAM (and Flow)

1 code implementation18 Apr 2024 Junyu Xie, Charig Yang, Weidi Xie, Andrew Zisserman

The objective of this paper is motion segmentation -- discovering and segmenting the moving objects in a video.

Motion Segmentation Object +6

Knowledge-enhanced Visual-Language Pretraining for Computational Pathology

1 code implementation15 Apr 2024 Xiao Zhou, Xiaoman Zhang, Chaoyi Wu, Ya zhang, Weidi Xie, Yanfeng Wang

In this paper, we consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources, along with the domain-specific knowledge in pathology.

Cross-Modal Retrieval Language Modelling +4

Towards Building Multilingual Language Model for Medicine

1 code implementation21 Feb 2024 Pengcheng Qiu, Chaoyi Wu, Xiaoman Zhang, Weixiong Lin, Haicheng Wang, Ya zhang, Yanfeng Wang, Weidi Xie

The development of open-source, multilingual medical language models can benefit a wide, linguistically diverse audience from different regions.

Domain Adaptation Language Modelling +1

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

no code implementations CVPR 2024 Chengjian Feng, Yujie Zhong, Zequn Jie, Weidi Xie, Lin Ma

The grounding head is trained to align the text embedding of category names with the regional visual feature of the diffusion model, using supervision from an off-the-shelf object detector, and a novel self-training scheme on (novel) categories not covered by the detector.

Object object-detection +1

Synchformer: Efficient Synchronization from Sparse Cues

2 code implementations29 Jan 2024 Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman

Our objective is audio-visual synchronization with a focus on 'in-the-wild' videos, such as those on YouTube, where synchronization cues can be sparse.

Audio-Visual Synchronization

Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models

1 code implementation CVPR 2024 Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie

Generative models have recently exhibited exceptional capabilities in text-to-image generation but still struggle to generate image sequences coherently.

Text-to-Image Generation Visual Storytelling

Retrieval-Augmented Egocentric Video Captioning

no code implementations CVPR 2024 Jilan Xu, Yifei HUANG, Junlin Hou, Guo Chen, Yuejie Zhang, Rui Feng, Weidi Xie

In this paper, (1) we develop EgoInstructor, a retrieval-augmented multimodal captioning model that automatically retrieves semantically relevant third-person instructional videos to enhance the video captioning of egocentric videos.

Representation Learning Retrieval +1

AutoAD III: The Prequel - Back to the Pixels

no code implementations CVPR 2024 Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names.

Amodal Ground Truth and Completion in the Wild

1 code implementation CVPR 2024 Guanqi Zhan, Chuanxia Zheng, Weidi Xie, Andrew Zisserman

In contrast, we use 3D data to establish an automatic pipeline to determine authentic ground truth amodal masks for partially occluded objects in real images.

Image Segmentation Segmentation +1

One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts

1 code implementation28 Dec 2023 Ziheng Zhao, Yao Zhang, Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

Our main contributions are three folds: (i) for dataset construction, we construct the first multi-modal knowledge tree on human anatomy, including 6502 anatomical terminologies; Then we build up the largest and most comprehensive segmentation dataset for training, by collecting over 22K 3D medical image scans from 72 segmentation datasets, across 497 classes, with careful standardization on both image scans and label space; (ii) for architecture design, we propose to inject medical knowledge into a text encoder via contrastive learning, and then formulate a universal segmentation model, that can be prompted by feeding in medical terminologies in text form; (iii) As a result, we have trained SAT-Nano (110M parameters) and SAT-Pro (447M parameters), demonstrating comparable performance to 72 specialist nnU-Nets trained on each dataset/subsets.

Anatomy Contrastive Learning +5

Multi-Sentence Grounding for Long-term Instructional Video

no code implementations21 Dec 2023 Zeqian Li, Qirui Chen, Tengda Han, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we aim to establish an automatic, scalable pipeline for denoising the large-scale instructional dataset and construct a high-quality video-text dataset with multiple descriptive steps supervision, named HowToStep.

Denoising Descriptive +5

Appearance-Based Refinement for Object-Centric Motion Segmentation

no code implementations18 Dec 2023 Junyu Xie, Weidi Xie, Andrew Zisserman

The goal of this paper is to discover, segment, and track independently moving objects in complex visual scenes.

Motion Segmentation Object +5

Grounded Question-Answering in Long Egocentric Videos

1 code implementation CVPR 2024 Shangzhe Di, Weidi Xie

Existing approaches to video understanding, mainly designed for short videos from a third-person perspective, are limited in their applicability in certain fields, such as robotics.

Video Grounding Video Question Answering +1

Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis

1 code implementation15 Oct 2023 Chaoyi Wu, Jiayu Lei, Qiaoyu Zheng, Weike Zhao, Weixiong Lin, Xiaoman Zhang, Xiao Zhou, Ziheng Zhao, Ya zhang, Yanfeng Wang, Weidi Xie

Driven by the large foundation models, the development of artificial intelligence has witnessed tremendous progress lately, leading to a surge of general interest from the public.

Anatomy Computed Tomography (CT) +2

A General Protocol to Probe Large Vision Models for 3D Physical Understanding

1 code implementation10 Oct 2023 Guanqi Zhan, Chuanxia Zheng, Weidi Xie, Andrew Zisserman

To this end, we make the following contributions: (i) We introduce a general and lightweight protocol to evaluate whether features of an off-the-shelf large vision model encode a number of physical 'properties' of the 3D scene, by training discriminative classifiers on the features for these properties.

AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

no code implementations10 Oct 2023 Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.

Language Modelling Text Generation

Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning

no code implementations20 Sep 2023 Luoyi Sun, Xuenan Xu, Mengyue Wu, Weidi Xie

Recently, the AI community has made significant strides in developing powerful foundation models, driven by large-scale multimodal datasets.

Audio captioning Caption Generation +6

UniBrain: Universal Brain MRI Diagnosis with Hierarchical Knowledge-enhanced Pre-training

1 code implementation13 Sep 2023 Jiayu Lei, Lisong Dai, Haoyun Jiang, Chaoyi Wu, Xiaoman Zhang, Yao Zhang, Jiangchao Yao, Weidi Xie, Yanyong Zhang, Yuehua Li, Ya zhang, Yanfeng Wang

Magnetic resonance imaging~(MRI) have played a crucial role in brain disease diagnosis, with which a range of computer-aided artificial intelligence methods have been proposed.

The Making and Breaking of Camouflage

no code implementations ICCV 2023 Hala Lamdouar, Weidi Xie, Andrew Zisserman

We also incorporate the proposed camouflage score into a generative model as an auxiliary loss and show that effective camouflage images or videos can be synthesised in a scalable manner.

Diagnosing Human-object Interaction Detectors

1 code implementation16 Aug 2023 Fangrui Zhu, Yiming Xie, Weidi Xie, Huaizu Jiang

To address this issue, in this paper, we introduce a diagnosis toolbox to provide detailed quantitative break-down analysis of HOI detection models, inspired by the success of object detection diagnosis toolboxes.

Classification Human-Object Interaction Detection +3

Joint-Relation Transformer for Multi-Person Motion Prediction

1 code implementation ICCV 2023 Qingyao Xu, Weibo Mao, Jingze Gong, Chenxin Xu, Siheng Chen, Weidi Xie, Ya zhang, Yanfeng Wang

Multi-person motion prediction is a challenging problem due to the dependency of motion on both individual past movements and interactions with other people.

motion prediction Relation

Boost Video Frame Interpolation via Motion Adaptation

1 code implementation24 Jun 2023 HaoNing Wu, Xiaoyun Zhang, Weidi Xie, Ya zhang, Yanfeng Wang

Video frame interpolation (VFI) is a challenging task that aims to generate intermediate frames between two consecutive frames in a video.

Motion Estimation Video Frame Interpolation

arXiVeri: Automatic table verification with GPT

1 code implementation13 Jun 2023 Gyungin Shin, Weidi Xie, Samuel Albanie

In this paper, we propose to meet this challenge through the novel task of automatic table verification (AutoTV), in which the objective is to verify the accuracy of numerical data in tables by cross-referencing cited sources.

Zero-shot Composed Text-Image Retrieval

1 code implementation12 Jun 2023 Yikun Liu, Jiangchao Yao, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider the problem of composed image retrieval (CIR), it aims to train a model that can fuse multi-modal information, e. g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.

Image Retrieval Retrieval +1

Multi-Modal Classifiers for Open-Vocabulary Object Detection

no code implementations8 Jun 2023 Prannay Kaul, Weidi Xie, Andrew Zisserman

The goal of this paper is open-vocabulary object detection (OVOD) $\unicode{x2013}$ building a model that can detect objects beyond the set of categories seen at training, thus enabling the user to specify categories of interest at inference without the need for model retraining.

Language Modelling Large Language Model +3

Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models

1 code implementation1 Jun 2023 Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie

Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently.

Story Visualization Style Transfer +2

Annotation-free Audio-Visual Segmentation

no code implementations18 May 2023 Jinxiang Liu, Yu Wang, Chen Ju, Chaofan Ma, Ya zhang, Weidi Xie

The objective of Audio-Visual Segmentation (AVS) is to localise the sounding objects within visual scenes by accurately predicting pixel-wise segmentation masks.

Image Segmentation Segmentation +1

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

2 code implementations17 May 2023 Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Weixiong Lin, Ya zhang, Yanfeng Wang, Weidi Xie

Medical Visual Question Answering (MedVQA) presents a significant opportunity to enhance diagnostic accuracy and healthcare delivery by leveraging artificial intelligence to interpret and answer questions based on medical images.

Benchmarking Generative Visual Question Answering +5

PMC-LLaMA: Towards Building Open-source Language Models for Medicine

1 code implementation27 Apr 2023 Chaoyi Wu, Weixiong Lin, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

Our contributions are threefold: (i) we systematically investigate the process of adapting a general-purpose foundation language model towards medical domain, this involves data-centric knowledge injection through the integration of 4. 8M biomedical academic papers and 30K medical textbooks, as well as comprehensive fine-tuning for alignment with domain-specific instructions; (ii) we contribute a large-scale, comprehensive dataset for instruction tuning.

Language Modelling Natural Language Understanding +1

Zero-shot Unsupervised Transfer Instance Segmentation

1 code implementation27 Apr 2023 Gyungin Shin, Samuel Albanie, Weidi Xie

Segmentation is a core computer vision competency, with applications spanning a broad range of scientifically and economically valuable domains.

Instance Segmentation Segmentation +1

Towards Open-Vocabulary Video Instance Segmentation

1 code implementation ICCV 2023 Haochen Wang, Cilin Yan, Shuai Wang, XiaoLong Jiang, Xu Tang, Yao Hu, Weidi Xie, Efstratios Gavves

Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos.

Instance Segmentation Segmentation +3

AutoAD: Movie Description in Context

1 code implementation CVPR 2023 Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

The objective of this paper is an automatic Audio Description (AD) model that ingests movies and outputs AD in text form.

Image Captioning Text Generation

Collaboration Helps Camera Overtake LiDAR in 3D Detection

1 code implementation CVPR 2023 Yue Hu, Yifan Lu, Runsheng Xu, Weidi Xie, Siheng Chen, Yanfeng Wang

Camera-only 3D detection provides an economical solution with a simple configuration for localizing objects in 3D space compared to LiDAR-based detection systems.

Depth Estimation

Multi-modal Prompting for Low-Shot Temporal Action Localization

no code implementations21 Mar 2023 Chen Ju, Zeqian Li, Peisen Zhao, Ya zhang, Xiaopeng Zhang, Qi Tian, Yanfeng Wang, Weidi Xie

In this paper, we consider the problem of temporal action localization under low-shot (zero-shot & few-shot) scenario, with the goal of detecting and classifying the action instances from arbitrary categories within some untrimmed videos, even not seen at training time.

Action Classification Temporal Action Localization

Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images

1 code implementation27 Feb 2023 Xiaoman Zhang, Chaoyi Wu, Ya zhang, Yanfeng Wang, Weidi Xie

While multi-modal foundation models pre-trained on large-scale data have been successful in natural language understanding and vision recognition, their use in medical domains is still limited due to the fine-grained nature of medical tasks and the high demand for domain knowledge.

Natural Language Understanding Representation Learning

OvarNet: Towards Open-vocabulary Object Attribute Recognition

1 code implementation CVPR 2023 Keyan Chen, XiaoLong Jiang, Yao Hu, Xu Tang, Yan Gao, Jianqi Chen, Weidi Xie

In this paper, we consider the problem of simultaneously detecting objects and inferring their visual attributes in an image, even for those with no manual annotations provided at the training stage, resembling an open-vocabulary scenario.

 Ranked #1 on Open Vocabulary Attribute Detection on OVAD benchmark (using extra training data)

Attribute Knowledge Distillation +5

Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision

1 code implementation CVPR 2023 Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng, Yi Wang, Yu Qiao, Weidi Xie

The former aims to infer all masked entities in the caption given the group tokens, that enables the model to learn fine-grained alignment between visual groups and text entities.

Open Vocabulary Semantic Segmentation Open-Vocabulary Semantic Segmentation +1

Open-vocabulary Object Segmentation with Diffusion Models

1 code implementation ICCV 2023 Ziyi Li, Qinye Zhou, Xiaoyun Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

The goal of this paper is to extract the visual-language correspondence from a pre-trained text-to-image diffusion model, in the form of segmentation map, i. e., simultaneously generating images and segmentation masks for the corresponding visual entities described in the text prompt.

Image Segmentation Object +3

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology

no code implementations5 Jan 2023 Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice.

Medical Diagnosis Self-Supervised Learning +1

AutoAD II: The Sequel - Who, When, and What in Movie Audio Description

no code implementations ICCV 2023 Tengda Han, Max Bain, Arsha Nagrani, Gul Varol, Weidi Xie, Andrew Zisserman

Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.

Language Modelling Text Generation

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis

no code implementations ICCV 2023 Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice.

Medical Diagnosis Triplet

Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models

1 code implementation27 Oct 2022 Chaofan Ma, Yuhuan Yang, Yanfeng Wang, Ya zhang, Weidi Xie

When trained at a sufficient scale, self-supervised learning has exhibited a notable ability to solve a wide range of visual or language understanding tasks.

Image Segmentation Language Modelling +4

A Tri-Layer Plugin to Improve Occluded Detection

1 code implementation18 Oct 2022 Guanqi Zhan, Weidi Xie, Andrew Zisserman

To this end we make the following four contributions: (1) We propose a simple 'plugin' module for the detection head of two-stage object detectors to improve the recall of partially occluded objects.

Instance Segmentation Object +3

Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors

2 code implementations13 Oct 2022 Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman

This contrasts with the case of synchronising videos of talking heads, where audio-visual correspondence is dense in both time and space.

Audio-Visual Synchronization

Turbo Training with Token Dropout

no code implementations10 Oct 2022 Tengda Han, Weidi Xie, Andrew Zisserman

The objective of this paper is an efficient training method for video tasks.

Action Classification Classification +1

A Simple Plugin for Transforming Images to Arbitrary Scales

no code implementations7 Oct 2022 Qinye Zhou, Ziyi Li, Weidi Xie, Xiaoyun Zhang, Ya zhang, Yanfeng Wang

Existing models on super-resolution often specialized for one scale, fundamentally limiting their use in practical scenarios.

Super-Resolution

NamedMask: Distilling Segmenters from Complementary Foundation Models

1 code implementation22 Sep 2022 Gyungin Shin, Weidi Xie, Samuel Albanie

Our method, termed NamedMask, begins by using CLIP to construct category-specific archives of images.

Data Augmentation Object +1

CounTR: Transformer-based Generalised Visual Counting

1 code implementation29 Aug 2022 Chang Liu, Yujie Zhong, Andrew Zisserman, Weidi Xie

In this paper, we consider the problem of generalised visual object counting, with the goal of developing a computational model for counting the number of objects from arbitrary semantic categories, using arbitrary number of "exemplars", i. e. zero-shot or few-shot counting.

Exemplar-Free Counting Self-Supervised Learning

Transforming the Interactive Segmentation for Medical Imaging

no code implementations20 Aug 2022 Wentao Liu, Chaofan Ma, Yuhuan Yang, Weidi Xie, Ya zhang

The goal of this paper is to interactively refine the automatic segmentation on challenging structures that fall behind human performance, either due to the scarcity of available annotations or the difficulty nature of the problem itself, for example, on segmenting cancer or small organs.

Decoder Interactive Segmentation +1

Aerial Monocular 3D Object Detection

no code implementations8 Aug 2022 Yue Hu, Shaoheng Fang, Weidi Xie, Siheng Chen

To fill the gap, this work proposes a dual-view detection system named DVDET to achieve aerial monocular object detection in both the 2D image space and the 3D physical space.

Autonomous Driving Monocular 3D Object Detection +2

Segmenting Moving Objects via an Object-Centric Layered Representation

1 code implementation5 Jul 2022 Junyu Xie, Weidi Xie, Andrew Zisserman

The objective of this paper is a model that is able to discover, track and segment multiple moving objects in a video.

Motion Segmentation Object +4

Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation

no code implementations26 Jun 2022 Jinxiang Liu, Chen Ju, Weidi Xie, Ya zhang

We present a simple yet effective self-supervised framework for audio-visual representation learning, to localize the sound source in videos.

Cross-Modal Retrieval Representation Learning +1

ReCo: Retrieve and Co-segment for Zero-shot Transfer

2 code implementations14 Jun 2022 Gyungin Shin, Weidi Xie, Samuel Albanie

Semantic segmentation has a broad range of applications, but its real-world impact has been significantly limited by the prohibitive annotation costs necessary to enable deployment.

Retrieval Segmentation +1

Temporal Alignment Networks for Long-term Video

1 code implementation CVPR 2022 Tengda Han, Weidi Xie, Andrew Zisserman

The objective of this paper is a temporal alignment network that ingests long term video sequences, and associated text sentences, in order to: (1) determine if a sentence is alignable with the video; and (2) if it is alignable, then determine its alignment.

Action Recognition Action Segmentation +4

PromptDet: Towards Open-vocabulary Detection using Uncurated Images

2 code implementations30 Mar 2022 Chengjian Feng, Yujie Zhong, Zequn Jie, Xiangxiang Chu, Haibing Ren, Xiaolin Wei, Weidi Xie, Lin Ma

The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations.

Language Modelling Object

Unsupervised Salient Object Detection with Spectral Cluster Voting

1 code implementation23 Mar 2022 Gyungin Shin, Samuel Albanie, Weidi Xie

In this paper, we tackle the challenging task of unsupervised salient object detection (SOD) by leveraging spectral clustering on self-supervised features.

Clustering Object +5

Label, Verify, Correct: A Simple Few Shot Object Detection Method

1 code implementation CVPR 2022 Prannay Kaul, Weidi Xie, Andrew Zisserman

The objective of this paper is few-shot object detection (FSOD) -- the task of expanding an object detector for a new category given only a few instances for training.

Benchmarking Few-Shot Object Detection +1

Audio-Visual Synchronisation in the wild

no code implementations8 Dec 2021 Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman

Finally, we set the first benchmark for general audio-visual synchronisation with over 160 diverse classes in the new VGG-Sound Sync video dataset.

Lip Reading

Prompting Visual-Language Models for Efficient Video Understanding

1 code implementation8 Dec 2021 Chen Ju, Tengda Han, Kunhao Zheng, Ya zhang, Weidi Xie

Image-based visual-language (I-VL) pre-training has shown great success for learning joint visual-textual representations from large-scale web data, revealing remarkable ability for zero-shot generalisation.

Action Recognition Language Modelling +4

It's About Time: Analog Clock Reading in the Wild

no code implementations CVPR 2022 Charig Yang, Weidi Xie, Andrew Zisserman

In this paper, we present a framework for reading analog clocks in natural images or videos.

ImplicitVol: Sensorless 3D Ultrasound Reconstruction with Deep Implicit Representation

no code implementations24 Sep 2021 Pak-Hei Yeung, Linde Hesse, Moska Aliasi, Monique Haak, the INTERGROWTH-21st Consortium, Weidi Xie, Ana I. L. Namburete

The objective of this work is to achieve sensorless reconstruction of a 3D volume from a set of 2D freehand ultrasound images with deep implicit representation.

SSIM

Sli2Vol: Annotate a 3D Volume from a Single Slice with Self-Supervised Learning

1 code implementation26 May 2021 Pak-Hei Yeung, Ana I. L. Namburete, Weidi Xie

The objective of this work is to segment any arbitrary structures of interest (SOI) in 3D volumes by only annotating a single slice, (i. e. semi-automatic 3D segmentation).

Segmentation Self-Supervised Learning

Self-supervised Video Object Segmentation by Motion Grouping

no code implementations ICCV 2021 Charig Yang, Hala Lamdouar, Erika Lu, Andrew Zisserman, Weidi Xie

We additionally evaluate on a challenging camouflage dataset (MoCA), significantly outperforming the other self-supervised approaches, and comparing favourably to the top supervised approach, highlighting the importance of motion cues, and the potential bias towards visual appearance in existing video segmentation models.

Motion Segmentation Object +6

All you need are a few pixels: semantic segmentation with PixelPick

2 code implementations13 Apr 2021 Gyungin Shin, Weidi Xie, Samuel Albanie

A central challenge for the task of semantic segmentation is the prohibitive cost of obtaining dense pixel-level annotations to supervise model training.

Active Learning Segmentation +1

Quantum Self-Supervised Learning

2 code implementations26 Mar 2021 Ben Jaderberg, Lewis W. Anderson, Weidi Xie, Samuel Albanie, Martin Kiffner, Dieter Jaksch

The resurgence of self-supervised learning, whereby a deep learning model generates its own supervisory signal from the data, promises a scalable way to tackle the dramatically increasing size of real-world data sets without human annotation.

Self-Supervised Learning

NeRF--: Neural Radiance Fields Without Known Camera Parameters

5 code implementations14 Feb 2021 ZiRui Wang, Shangzhe Wu, Weidi Xie, Min Chen, Victor Adrian Prisacariu

Considering the problem of novel view synthesis (NVS) from only a set of 2D images, we simplify the training process of Neural Radiance Field (NeRF) on forward-facing scenes by removing the requirement of known or pre-computed camera parameters, including both intrinsics and 6DoF poses.

Novel View Synthesis

Betrayed by Motion: Camouflaged Object Discovery via Motion Segmentation

no code implementations23 Nov 2020 Hala Lamdouar, Charig Yang, Weidi Xie, Andrew Zisserman

We make the following three contributions: (i) We propose a novel architecture that consists of two essential components for breaking camouflage, namely, a differentiable registration module to align consecutive frames based on the background, which effectively emphasises the object boundary in the difference image, and a motion segmentation module with memory that discovers the moving objects, while maintaining the object permanence even when motion is absent at some point.

Motion Segmentation Object +3

Layered Neural Rendering for Retiming People in Video

1 code implementation16 Sep 2020 Erika Lu, Forrester Cole, Tali Dekel, Weidi Xie, Andrew Zisserman, David Salesin, William T. Freeman, Michael Rubinstein

We present a method for retiming people in an ordinary, natural video -- manipulating and editing the time in which different motions of individuals in the video occur.

Neural Rendering

Inducing Predictive Uncertainty Estimation for Face Recognition

no code implementations1 Sep 2020 Weidi Xie, Jeffrey Byrne, Andrew Zisserman

We describe three use cases on the public IJB-C face verification benchmark: (i) to improve 1:1 image-based verification error rates by rejecting low-quality face images; (ii) to improve quality score based fusion performance on the 1:1 set-based verification benchmark; and (iii) its use as a quality measure for selecting high quality (unblurred, good lighting, more frontal) faces from a collection, e. g. for automatic enrolment or display.

Face Recognition Face Verification

Memory-augmented Dense Predictive Coding for Video Representation Learning

1 code implementation ECCV 2020 Tengda Han, Weidi Xie, Andrew Zisserman

The objective of this paper is self-supervised learning from video, in particular for representations for action recognition.

Action Classification Action Recognition +5

Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval

2 code implementations ECCV 2020 Andrew Brown, Weidi Xie, Vicky Kalogeiton, Andrew Zisserman

Optimising a ranking-based metric, such as Average Precision (AP), is notoriously challenging due to the fact that it is non-differentiable, and hence cannot be optimised directly using gradient-descent methods.

Image Instance Retrieval Metric Learning +2

Self-supervised Video Object Segmentation

no code implementations22 Jun 2020 Fangrui Zhu, Li Zhang, Yanwei Fu, Guodong Guo, Weidi Xie

The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a. k. a.

Object One-shot visual object segmentation +4

VGGSound: A Large-scale Audio-Visual Dataset

3 code implementations29 Apr 2020 Honglie Chen, Weidi Xie, Andrea Vedaldi, Andrew Zisserman

Our goal is to collect a large-scale audio-visual dataset with low label noise from videos in the wild using computer vision techniques.

Image Classification

MAST: A Memory-Augmented Self-supervised Tracker

2 code implementations CVPR 2020 Zihang Lai, Erika Lu, Weidi Xie

Recent interest in self-supervised dense tracking has yielded rapid progress, but performance still remains far from supervised methods.

Semantic Segmentation Semi-Supervised Video Object Segmentation +2

VoxSRC 2019: The first VoxCeleb Speaker Recognition Challenge

no code implementations5 Dec 2019 Joon Son Chung, Arsha Nagrani, Ernesto Coto, Weidi Xie, Mitchell McLaren, Douglas A. Reynolds, Andrew Zisserman

The VoxCeleb Speaker Recognition Challenge 2019 aimed to assess how well current speaker recognition technology is able to identify speakers in unconstrained or `in the wild' data.

Speaker Recognition

Video Representation Learning by Dense Predictive Coding

1 code implementation10 Sep 2019 Tengda Han, Weidi Xie, Andrew Zisserman

The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition.

Representation Learning Self-Supervised Action Recognition +3

AutoCorrect: Deep Inductive Alignment of Noisy Geometric Annotations

no code implementations14 Aug 2019 Honglie Chen, Weidi Xie, Andrea Vedaldi, Andrew Zisserman

We propose AutoCorrect, a method to automatically learn object-annotation alignments from a dataset with annotations affected by geometric noise.

Object

Self-supervised Learning for Video Correspondence Flow

1 code implementation2 May 2019 Zihang Lai, Weidi Xie

Fourth, in order to shed light on the potential of self-supervised learning on the task of video correspondence flow, we probe the upper bound by training on additional data, \ie more diverse videos, further demonstrating significant improvements on video segmentation.

Self-Supervised Learning Semi-Supervised Video Object Segmentation +4

Utterance-level Aggregation For Speaker Recognition In The Wild

10 code implementations26 Feb 2019 Weidi Xie, Arsha Nagrani, Joon Son Chung, Andrew Zisserman

The objective of this paper is speaker recognition "in the wild"-where utterances may be of variable length and also contain irrelevant signals.

Speaker Recognition Text-Independent Speaker Verification

Class-Agnostic Counting

1 code implementation1 Nov 2018 Erika Lu, Weidi Xie, Andrew Zisserman

The model achieves competitive performance on cell and crowd counting datasets, and surpasses the state-of-the-art on the car dataset using only three training images.

Crowd Counting Few-Shot Learning +2

Comparator Networks

no code implementations ECCV 2018 Weidi Xie, Li Shen, Andrew Zisserman

Our contributions are: (i) We propose a Deep Comparator Network (DCN) that can ingest a pair of sets (each may contain a variable number of images) as inputs, and compute a similarity between the pair--this involves attending to multiple discriminative local regions (landmarks), and comparing local descriptors between pairs of faces; (ii) To encourage high-quality representations for each set, internal competition is introduced for recalibration based on the landmark score; (iii) Inspired by image retrieval, a novel hard sample mining regime is proposed to control the sampling process, such that the DCN is complementary to the standard image classification models.

Face Recognition Image Classification +2

Multicolumn Networks for Face Recognition

1 code implementation24 Jul 2018 Weidi Xie, Andrew Zisserman

In this paper, we design a neural network architecture that learns to aggregate based on both "visual" quality (resolution, illumination), and "content" quality (relative importance for discriminative classification).

Face Identification Face Recognition +2

Ω-Net (Omega-Net): Fully Automatic, Multi-View Cardiac MR Detection, Orientation, and Segmentation with Deep Neural Networks

no code implementations3 Nov 2017 Davis M. Vigneault, Weidi Xie, Carolyn Y. Ho, David A. Bluemke, J. Alison Noble

Pixelwise segmentation of the left ventricular (LV) myocardium and the four cardiac chambers in 2-D steady state free precession (SSFP) cine sequences is an essential preprocessing step for a wide range of analyses.

Image Segmentation Segmentation +1

VGGFace2: A dataset for recognising faces across pose and age

23 code implementations23 Oct 2017 Qiong Cao, Li Shen, Weidi Xie, Omkar M. Parkhi, Andrew Zisserman

The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimize the label noise.

 Ranked #1 on Face Verification on IJB-C (training dataset metric)

Face Recognition Face Verification +1

Freehand Ultrasound Image Simulation with Spatially-Conditioned Generative Adversarial Networks

no code implementations17 Jul 2017 Yipeng Hu, Eli Gibson, Li-Lin Lee, Weidi Xie, Dean C. Barratt, Tom Vercauteren, J. Alison Noble

Sonography synthesis has a wide range of applications, including medical procedure simulation, clinical training and multimodality image registration.

Anatomy Image Registration +1

Feature Tracking Cardiac Magnetic Resonance via Deep Learning and Spline Optimization

no code implementations12 Apr 2017 Davis M. Vigneault, Weidi Xie, David A. Bluemke, J. Alison Noble

Feature tracking Cardiac Magnetic Resonance (CMR) has recently emerged as an area of interest for quantification of regional cardiac function from balanced, steady state free precession (SSFP) cine sequences.

Cannot find the paper you are looking for? You can Submit a new open access paper.