no code implementations • 19 May 2022 • Jiayi Zheng, Ling Yang, Heyuan Wang, Cheng Yang, Yinghong Li, Xiaowei Hu, Shenda Hong
To adequately leverage neighbor proximity and high-order information, we design a novel spatial autoregressive paradigm.
no code implementations • 20 Apr 2022 • Sheng Shen, Chunyuan Li, Xiaowei Hu, Yujia Xie, Jianwei Yang, Pengchuan Zhang, Anna Rohrbach, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Jianfeng Gao
In this paper, we propose K-LITE (Knowledge-augmented Language-Image Training and Evaluation), a simple strategy to leverage external knowledge to build transferable visual systems: In training, it enriches entities in natural language with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that can understand both visual concepts and their knowledge; In evaluation, the natural language is also augmented with external knowledge and then used to reference learned visual concepts (or describe new ones) to enable zero-shot and few-shot transfer of the pre-trained models.
1 code implementation • 21 Jan 2022 • Huifeng Yao, Xiaowei Hu, Xiaomeng Li
With these augmentations as perturbations, we feed the input to a confidence-aware cross pseudo supervision network to measure the variance of pseudo labels and regularize the network to learn with more confident pseudo labels.
no code implementations • 9 Dec 2021 • Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu
In this paper, we are concerned with a better-performing detector-free image captioning model, and propose a pure vision transformer-based image captioning model, dubbed as ViTCAP, in which grid representations are used without extracting the regional features.
no code implementations • 24 Nov 2021 • Xiaowei Hu, Zhe Gan, JianFeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, Lijuan Wang
In this paper, we present LEMON, a LargE-scale iMage captiONer, and provide the first empirical study on the scaling behavior of VLP for image captioning.
Ranked #1 on
Image Captioning
on nocaps-val-overall
no code implementations • 23 Nov 2021 • Zhengyuan Yang, Zhe Gan, JianFeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang
In this paper, we propose UNICORN, a vision-language (VL) model that unifies text generation and bounding box prediction into a single architecture.
no code implementations • 19 Nov 2021 • JianFeng Wang, Xiaowei Hu, Zhe Gan, Zhengyuan Yang, Xiyang Dai, Zicheng Liu, Yumao Lu, Lijuan Wang
In this paper, we propose a single UniFied transfOrmer (UFO), which is capable of processing either unimodal inputs (e. g., image or language) or multimodal inputs (e. g., the concatenation of the image and the question), for vision-language (VL) representation learning.
no code implementations • 14 Sep 2021 • Xiaowei Hu, Jaejin Jang, Nabeel Hamoud, Amirsaman Bajgiran
The inventories carried in a supply chain as a strategic tool to influence the competing firms are considered to be strategic inventories (SI).
no code implementations • 10 Sep 2021 • Zhengyuan Yang, Zhe Gan, JianFeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, Lijuan Wang
To address this challenge, we propose PICa, a simple yet effective method that Prompts GPT3 via the use of Image Captions, for knowledge-based VQA.
1 code implementation • CVPR 2021 • Tianyu Wang, Xiaowei Hu, Chi-Wing Fu, Pheng-Ann Heng
Instance shadow detection aims to find shadow instances paired with the objects that cast the shadows.
Ranked #1 on
Instance Shadow Detection
on SOBA
no code implementations • ICCV 2021 • Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lijuan Wang, Yezhou Yang, Zicheng Liu
In this paper, we study knowledge distillation (KD) to effectively compress a transformer-based large VL model into a small VL model.
no code implementations • 5 Apr 2021 • Cheng Xue, Lei Zhu, Huazhu Fu, Xiaowei Hu, Xiaomeng Li, Hai Zhang, Pheng Ann Heng
The BD modules learn additional breast lesion boundary map to enhance the boundary quality of a segmentation result refinement.
no code implementations • 5 Feb 2021 • Jingjing Ren, Xiaowei Hu, Lei Zhu, Xuemiao Xu, Yangyang Xu, Weiming Wang, Zijun Deng, Pheng-Ann Heng
Camouflaged object detection is a challenging task that aims to identify objects having similar texture to the surroundings.
no code implementations • 7 Jan 2021 • Ningxin Xu, Cheng Yang, Yixin Zhu, Xiaowei Hu, Changhu Wang
Most typical click models assume that the probability of a document to be examined by users only depends on position, such as PBM and UBM.
6 code implementations • CVPR 2021 • Pengchuan Zhang, Xiujun Li, Xiaowei Hu, Jianwei Yang, Lei Zhang, Lijuan Wang, Yejin Choi, Jianfeng Gao
In our experiments we feed the visual features generated by the new object detection model into a Transformer-based VL fusion model \oscar \cite{li2020oscar}, and utilize an improved approach \short\ to pre-train the VL model and fine-tune it on a wide range of downstream VL tasks.
Ranked #5 on
Visual Question Answering
on VQA v2 test-std
no code implementations • 13 Dec 2020 • JianFeng Wang, Xiaowei Hu, Pengchuan Zhang, Xiujun Li, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu
We design a Two-stage Efficient feature Extractor (TEE), inspired by the one-stage EfficientDet network, to significantly reduce the time cost of visual feature extraction by $95\%$, compared to a baseline model.
no code implementations • 28 Sep 2020 • Xiaowei Hu, Xi Yin, Kevin Lin, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu
It is highly desirable yet challenging to generate image captions that can describe novel objects which are unseen in caption-labeled training data, a capability that is evaluated in the novel object captioning challenge (nocaps).
Ranked #1 on
Image Captioning
on nocaps-XD out-of-domain
2 code implementations • 16 Nov 2019 • Xiaowei Hu, Tianyu Wang, Chi-Wing Fu, Yitong Jiang, Qiong Wang, Pheng-Ann Heng
Shadow detection in general photos is a nontrivial problem, due to the complexity of the real world.
no code implementations • 3 Jul 2019 • Lihao Liu, Xiaowei Hu, Lei Zhu, Pheng-Ann Heng
This paper presents a novel framework for unsupervised 3D brain image registration by capturing the feature-level transformation relationships between the unaligned image and reference image.
no code implementations • CVPR 2019 • Xiaowei Hu, Chi-Wing Fu, Lei Zhu, Pheng-Ann Heng
Rain is a common weather phenomenon, where object visibility varies with depth from the camera and objects faraway are visually blocked more by fog than by rain streaks.
Ranked #4 on
Single Image Deraining
on RainCityscapes
4 code implementations • ICCV 2019 • Xiaowei Hu, Yitong Jiang, Chi-Wing Fu, Pheng-Ann Heng
This paper presents a new method for shadow removal using unpaired data, enabling us to avoid tedious annotations and obtain more diverse training samples.
no code implementations • 25 Mar 2019 • Xiaowei Hu, Chi-Wing Fu, Lei Zhu, Tianyu Wang, Pheng-Ann Heng
This paper presents a new deep neural network design for salient object detection by maximizing the integration of local and global image context within, around, and beyond the salient objects.
1 code implementation • 12 May 2018 • Xiaowei Hu, Chi-Wing Fu, Lei Zhu, Jing Qin, Pheng-Ann Heng
This paper presents a novel deep neural network design for shadow detection and removal by analyzing the spatial image context in a direction-aware manner.
Ranked #3 on
Shadow Removal
on ISTD
no code implementations • 2 Apr 2018 • Xiaowei Hu, Xuemiao Xu, Yongjie Xiao, Hao Chen, Shengfeng He, Jing Qin, Pheng-Ann Heng
Based on these findings, we present a scale-insensitive convolutional neural network (SINet) for fast detecting vehicles with a large variance of scales.
1 code implementation • CVPR 2018 • Xiaowei Hu, Lei Zhu, Chi-Wing Fu, Jing Qin, Pheng-Ann Heng
To achieve this, we first formulate the direction-aware attention mechanism in a spatial recurrent neural network (RNN) by introducing attention weights when aggregating spatial context features in the RNN.
Ranked #2 on
RGB Salient Object Detection
on SBU
no code implementations • 22 Sep 2016 • Xiaowei Hu, Prashanth L. A., András György, Csaba Szepesvári
Algorithms for bandit convex optimization and online learning often rely on constructing noisy gradient estimates, which are then used in appropriately adjusted first-order algorithms, replacing actual gradients.