Search Results for author: Xiaowei Hu

Found 48 papers, 28 papers with code

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI

1 code implementation21 Nov 2024 Tianbin Li, Yanzhou Su, Wei Li, Bin Fu, Zhe Chen, Ziyan Huang, Guoan Wang, Chenglong Ma, Ying Chen, Ming Hu, Yanjun Li, Pengcheng Chen, Xiaowei Hu, Zhongying Deng, Yuanfeng Ji, Jin Ye, Yu Qiao, Junjun He

Despite significant advancements in general artificial intelligence, such as GPT-4, their effectiveness in the medical domain (general medical AI, GMAI) remains constrained due to the absence of specialized medical knowledge.

Decision Making Language Modelling +2

MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models

no code implementations17 Oct 2024 Donghao Zhou, Jiancheng Huang, Jinbin Bai, Jiaze Wang, Hao Chen, Guangyong Chen, Xiaowei Hu, Pheng-Ann Heng

Recent advancements in text-to-image (T2I) diffusion models have enabled the creation of high-quality images from text prompts, but they still struggle to generate images with precise control over specific visual concepts.

Image Generation

Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models

1 code implementation3 Sep 2024 Jiaqi Xu, Mengyang Wu, Xiaowei Hu, Chi-Wing Fu, Qi Dou, Pheng-Ann Heng

For clearness enhancement, we use real-world data, utilizing a dual-step strategy with pseudo-labels assessed by vision-language models and weather prompt learning.

Image Restoration Language Modelling

TrafficMOT: A Challenging Dataset for Multi-Object Tracking in Complex Traffic Scenarios

no code implementations30 Nov 2023 Lihao Liu, Yanqi Cheng, Zhongying Deng, Shujun Wang, Dongdong Chen, Xiaowei Hu, Pietro Liò, Carola-Bibiane Schönlieb, Angelica Aviles-Rivero

Multi-object tracking in traffic videos is a crucial research area, offering immense potential for enhancing traffic monitoring accuracy and promoting road safety measures through the utilisation of advanced machine learning algorithms.

Multi-Object Tracking Object

IDRNet: Intervention-Driven Relation Network for Semantic Segmentation

1 code implementation NeurIPS 2023 Zhenchao Jin, Xiaowei Hu, Lingting Zhu, Luchuan Song, Li Yuan, Lequan Yu

Next, a deletion diagnostics procedure is conducted to model relations of these semantic-level representations via perceiving the network outputs and the extracted relations are utilized to guide the semantic-level representations to interact with each other.

Relation Relation Network +1

SILT: Shadow-aware Iterative Label Tuning for Learning to Detect Shadows from Noisy Labels

1 code implementation ICCV 2023 Han Yang, Tianyu Wang, Xiaowei Hu, Chi-Wing Fu

Existing shadow detection datasets often contain missing or mislabeled shadows, which can hinder the performance of deep learning models trained directly on such data.

Shadow Detection

Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions

no code implementations CVPR 2023 Yurui Zhu, Tianyu Wang, Xueyang Fu, Xuanyu Yang, Xin Guo, Jifeng Dai, Yu Qiao, Xiaowei Hu

Inspired by this observation, we design an efficient unified framework with a two-stage training strategy to explore the weather-general and weather-specific features.

Image Restoration

Matching Is Not Enough: A Two-Stage Framework for Category-Agnostic Pose Estimation

1 code implementation CVPR 2023 Min Shi, Zihao Huang, Xianzheng Ma, Xiaowei Hu, Zhiguo Cao

To calibrate the inaccurate matching results, we introduce a two-stage framework, where matched keypoints from the first stage are viewed as similarity-aware position proposals.

Category-Agnostic Pose Estimation Decoder +1

Video Instance Shadow Detection Under the Sun and Sky

1 code implementation23 Nov 2022 Zhenghao Xing, Tianyu Wang, Xiaowei Hu, Haoran Wu, Chi-Wing Fu, Pheng-Ann Heng

Instance shadow detection, crucial for applications such as photo editing and light direction estimation, has undergone significant advancements in predicting shadow instances, object instances, and their associations.

Contrastive Learning Instance Shadow Detection +3

Sparse2Dense: Learning to Densify 3D Features for 3D Object Detection

1 code implementation23 Nov 2022 Tianyu Wang, Xiaowei Hu, Zhengzhe Liu, Chi-Wing Fu

Importantly, we formulate the lightweight plug-in S2D module and the point cloud reconstruction module in SDet to densify 3D features and train SDet to produce 3D features, following the dense 3D features in DDet.

3D Object Detection Domain Adaptation +2

Demystify Transformers & Convolutions in Modern Image Deep Networks

1 code implementation10 Nov 2022 Xiaowei Hu, Min Shi, Weiyun Wang, Sitong Wu, Linjie Xing, Wenhai Wang, Xizhou Zhu, Lewei Lu, Jie zhou, Xiaogang Wang, Yu Qiao, Jifeng Dai

Our experiments on various tasks and an analysis of inductive bias show a significant performance boost due to advanced network-level and block-level designs, but performance differences persist among different STMs.

Image Deep Networks Spatial Token Mixer

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

3 code implementations CVPR 2023 Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state.

 Ranked #1 on Instance Segmentation on COCO test-dev (AP50 metric, using extra training data)

Classification Image Classification +3

Learning Shadow Correspondence for Video Shadow Detection

no code implementations30 Jul 2022 Xinpeng Ding, Jingweng Yang, Xiaowei Hu, Xiaomeng Li

We further design a new evaluation metric to evaluate the temporal stability of the video shadow detection results.

Shadow Detection Video Shadow Detection

NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis

1 code implementation20 Jul 2022 Chenfei Wu, Jian Liang, Xiaowei Hu, Zhe Gan, JianFeng Wang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan

In this paper, we present NUWA-Infinity, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily-sized high-resolution images or long-duration videos.

Image Outpainting Text-to-Image Generation +1

Instance Shadow Detection with A Single-Stage Detector

2 code implementations11 Jul 2022 Tianyu Wang, Xiaowei Hu, Pheng-Ann Heng, Chi-Wing Fu

This paper formulates a new problem, instance shadow detection, which aims to detect shadow instance and the associated object instance that cast each shadow in the input image.

Instance Shadow Detection Object +2

Dynamic Message Propagation Network for RGB-D Salient Object Detection

no code implementations20 Jun 2022 Baian Chen, Zhilei Chen, Xiaowei Hu, Jun Xu, Haoran Xie, Mingqiang Wei, Jing Qin

This paper presents a novel deep neural network framework for RGB-D salient object detection by controlling the message passing between the RGB images and depth maps on the feature level and exploring the long-range semantic contexts and geometric information on both RGB and depth features to infer salient objects.

object-detection RGB-D Salient Object Detection +1

GLIPv2: Unifying Localization and Vision-Language Understanding

1 code implementation12 Jun 2022 Haotian Zhang, Pengchuan Zhang, Xiaowei Hu, Yen-Chun Chen, Liunian Harold Li, Xiyang Dai, Lijuan Wang, Lu Yuan, Jenq-Neng Hwang, Jianfeng Gao

We present GLIPv2, a grounded VL understanding model, that serves both localization tasks (e. g., object detection, instance segmentation) and Vision-Language (VL) understanding tasks (e. g., VQA, image captioning).

 Ranked #1 on Phrase Grounding on Flickr30k Entities Test (using extra training data)

Contrastive Learning Image Captioning +7

GIT: A Generative Image-to-text Transformer for Vision and Language

1 code implementation27 May 2022 JianFeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang

In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify vision-language tasks such as image/video captioning and question answering.

Decoder Image Captioning +9

Spatial Autoregressive Coding for Graph Neural Recommendation

no code implementations19 May 2022 Jiayi Zheng, Ling Yang, Heyuan Wang, Cheng Yang, Yinghong Li, Xiaowei Hu, Shenda Hong

To adequately leverage neighbor proximity and high-order information, we design a novel spatial autoregressive paradigm.

Graph Embedding

K-LITE: Learning Transferable Visual Models with External Knowledge

2 code implementations20 Apr 2022 Sheng Shen, Chunyuan Li, Xiaowei Hu, Jianwei Yang, Yujia Xie, Pengchuan Zhang, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Anna Rohrbach, Jianfeng Gao

We propose K-LITE, a simple strategy to leverage external knowledge for building transferable visual systems: In training, it enriches entities in text with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that uses knowledge about the visual concepts.

Benchmarking Descriptive +4

Enhancing Pseudo Label Quality for Semi-Supervised Domain-Generalized Medical Image Segmentation

1 code implementation21 Jan 2022 Huifeng Yao, Xiaowei Hu, Xiaomeng Li

With these augmentations as perturbations, we feed the input to a confidence-aware cross pseudo supervision network to measure the variance of pseudo labels and regularize the network to learn with more confident pseudo labels.

Image Segmentation Medical Image Segmentation +2

Injecting Semantic Concepts into End-to-End Image Captioning

1 code implementation CVPR 2022 Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu

In this paper, we are concerned with a better-performing detector-free image captioning model, and propose a pure vision transformer-based image captioning model, dubbed as ViTCAP, in which grid representations are used without extracting the regional features.

Caption Generation Image Captioning

Scaling Up Vision-Language Pre-training for Image Captioning

no code implementations CVPR 2022 Xiaowei Hu, Zhe Gan, JianFeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, Lijuan Wang

In this paper, we present LEMON, a LargE-scale iMage captiONer, and provide the first empirical study on the scaling behavior of VLP for image captioning.

Ranked #3 on Image Captioning on nocaps-XD entire (using extra training data)

Attribute Image Captioning

UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling

1 code implementation23 Nov 2021 Zhengyuan Yang, Zhe Gan, JianFeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang

On grounded captioning, UniTAB presents a simpler solution with a single output head, and significantly outperforms state of the art in both grounding and captioning evaluations.

Image Captioning Language Modelling +5

UFO: A UniFied TransfOrmer for Vision-Language Representation Learning

no code implementations19 Nov 2021 JianFeng Wang, Xiaowei Hu, Zhe Gan, Zhengyuan Yang, Xiyang Dai, Zicheng Liu, Yumao Lu, Lijuan Wang

In this paper, we propose a single UniFied transfOrmer (UFO), which is capable of processing either unimodal inputs (e. g., image or language) or multimodal inputs (e. g., the concatenation of the image and the question), for vision-language (VL) representation learning.

Image Captioning Image-text matching +9

Strategic Inventories in a Supply Chain with Downstream Cournot Duopoly

no code implementations14 Sep 2021 Xiaowei Hu, Jaejin Jang, Nabeel Hamoud, Amirsaman Bajgiran

The inventories carried in a supply chain as a strategic tool to influence the competing firms are considered to be strategic inventories (SI).

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA

1 code implementation10 Sep 2021 Zhengyuan Yang, Zhe Gan, JianFeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, Lijuan Wang

To address this challenge, we propose PICa, a simple yet effective method that Prompts GPT3 via the use of Image Captions, for knowledge-based VQA.

Ranked #21 on Visual Question Answering (VQA) on OK-VQA (using extra training data)

Image Captioning Question Answering +2

Compressing Visual-linguistic Model via Knowledge Distillation

no code implementations ICCV 2021 Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lijuan Wang, Yezhou Yang, Zicheng Liu

In this paper, we study knowledge distillation (KD) to effectively compress a transformer-based large VL model into a small VL model.

Image Captioning Knowledge Distillation +2

Global Guidance Network for Breast Lesion Segmentation in Ultrasound Images

no code implementations5 Apr 2021 Cheng Xue, Lei Zhu, Huazhu Fu, Xiaowei Hu, Xiaomeng Li, Hai Zhang, Pheng Ann Heng

The BD modules learn additional breast lesion boundary map to enhance the boundary quality of a segmentation result refinement.

Boundary Detection Image Segmentation +3

Deep Texture-Aware Features for Camouflaged Object Detection

no code implementations5 Feb 2021 Jingjing Ren, Xiaowei Hu, Lei Zhu, Xuemiao Xu, Yangyang Xu, Weiming Wang, Zijun Deng, Pheng-Ann Heng

Camouflaged object detection is a challenging task that aims to identify objects having similar texture to the surroundings.

Object object-detection +1

Relief and Stimulus in A Cross-sector Multi-product Scarce Resource Supply Chain Network

1 code implementation22 Jan 2021 Xiaowei Hu, Peng Li

In the era of a growing population, systemic changes to the world, and the rising risk of crises, humanity has been facing an unprecedented challenge of resource scarcity.

Incorporating Vision Bias into Click Models for Image-oriented Search Engine

no code implementations7 Jan 2021 Ningxin Xu, Cheng Yang, Yixin Zhu, Xiaowei Hu, Changhu Wang

Most typical click models assume that the probability of a document to be examined by users only depends on position, such as PBM and UBM.

Position

VinVL: Revisiting Visual Representations in Vision-Language Models

7 code implementations CVPR 2021 Pengchuan Zhang, Xiujun Li, Xiaowei Hu, Jianwei Yang, Lei Zhang, Lijuan Wang, Yejin Choi, Jianfeng Gao

In our experiments we feed the visual features generated by the new object detection model into a Transformer-based VL fusion model \oscar \cite{li2020oscar}, and utilize an improved approach \short\ to pre-train the VL model and fine-tune it on a wide range of downstream VL tasks.

Image Captioning Image-text matching +4

MiniVLM: A Smaller and Faster Vision-Language Model

no code implementations13 Dec 2020 JianFeng Wang, Xiaowei Hu, Pengchuan Zhang, Xiujun Li, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu

We design a Two-stage Efficient feature Extractor (TEE), inspired by the one-stage EfficientDet network, to significantly reduce the time cost of visual feature extraction by $95\%$, compared to a baseline model.

Language Modelling

VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning

no code implementations28 Sep 2020 Xiaowei Hu, Xi Yin, Kevin Lin, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu

It is highly desirable yet challenging to generate image captions that can describe novel objects which are unseen in caption-labeled training data, a capability that is evaluated in the novel object captioning challenge (nocaps).

Image Captioning Object +1

Probabilistic Multilayer Regularization Network for Unsupervised 3D Brain Image Registration

no code implementations3 Jul 2019 Lihao Liu, Xiaowei Hu, Lei Zhu, Pheng-Ann Heng

This paper presents a novel framework for unsupervised 3D brain image registration by capturing the feature-level transformation relationships between the unaligned image and reference image.

Image Registration

Depth-Attentional Features for Single-Image Rain Removal

no code implementations CVPR 2019 Xiaowei Hu, Chi-Wing Fu, Lei Zhu, Pheng-Ann Heng

Rain is a common weather phenomenon, where object visibility varies with depth from the camera and objects faraway are visually blocked more by fog than by rain streaks.

Single Image Deraining

Mask-ShadowGAN: Learning to Remove Shadows from Unpaired Data

5 code implementations ICCV 2019 Xiaowei Hu, Yitong Jiang, Chi-Wing Fu, Pheng-Ann Heng

This paper presents a new method for shadow removal using unpaired data, enabling us to avoid tedious annotations and obtain more diverse training samples.

Shadow Removal

SAC-Net: Spatial Attenuation Context for Salient Object Detection

no code implementations25 Mar 2019 Xiaowei Hu, Chi-Wing Fu, Lei Zhu, Tianyu Wang, Pheng-Ann Heng

This paper presents a new deep neural network design for salient object detection by maximizing the integration of local and global image context within, around, and beyond the salient objects.

Object object-detection +2

Direction-aware Spatial Context Features for Shadow Detection and Removal

2 code implementations12 May 2018 Xiaowei Hu, Chi-Wing Fu, Lei Zhu, Jing Qin, Pheng-Ann Heng

This paper presents a novel deep neural network design for shadow detection and removal by analyzing the spatial image context in a direction-aware manner.

Shadow Detection And Removal Shadow Removal

SINet: A Scale-insensitive Convolutional Neural Network for Fast Vehicle Detection

no code implementations2 Apr 2018 Xiaowei Hu, Xuemiao Xu, Yongjie Xiao, Hao Chen, Shengfeng He, Jing Qin, Pheng-Ann Heng

Based on these findings, we present a scale-insensitive convolutional neural network (SINet) for fast detecting vehicles with a large variance of scales.

Fast Vehicle Detection object-detection +1

Direction-aware Spatial Context Features for Shadow Detection

2 code implementations CVPR 2018 Xiaowei Hu, Lei Zhu, Chi-Wing Fu, Jing Qin, Pheng-Ann Heng

To achieve this, we first formulate the direction-aware attention mechanism in a spatial recurrent neural network (RNN) by introducing attention weights when aggregating spatial context features in the RNN.

Detecting Shadows Shadow Detection

(Bandit) Convex Optimization with Biased Noisy Gradient Oracles

no code implementations22 Sep 2016 Xiaowei Hu, Prashanth L. A., András György, Csaba Szepesvári

Algorithms for bandit convex optimization and online learning often rely on constructing noisy gradient estimates, which are then used in appropriately adjusted first-order algorithms, replacing actual gradients.

Cannot find the paper you are looking for? You can Submit a new open access paper.