1 code implementation • ECCV 2020 • Yukang Wang, Wei Zhou, Tao Jiang, Xiang Bai, Yongchao Xu
In this paper, different from previous methods performing knowledge distillation for densely pairwise relations, we propose a novel intra-class feature variation distillation (IFVD) to transfer the intra-class feature variation (IFV) of the cumbersome model (teacher) to the compact model (student).
no code implementations • 28 Nov 2023 • Ling Fu, Zijie Wu, Yingying Zhu, Yuliang Liu, Xiang Bai
We contend that one main limitation of existing generation methods is the insufficient integration of foreground text with the background.
1 code implementation • 11 Nov 2023 • Zhang Li, Biao Yang, Qiang Liu, Zhiyin Ma, Shuo Zhang, Jingxu Yang, Yabo Sun, Yuliang Liu, Xiang Bai
Additionally, experiments on 18 datasets further demonstrate that Monkey surpasses existing LMMs in many tasks like Image Captioning and various Visual Question Answering formats.
1 code implementation • 23 Oct 2023 • Wei Chen, Qiushi Wang, Zefei Long, Xianyin Zhang, Zhongtian Lu, Bingxuan Li, Siyuan Wang, Jiarong Xu, Xiang Bai, Xuanjing Huang, Zhongyu Wei
We propose Multiple Experts Fine-tuning Framework to build a financial large language model (LLM), DISC-FinLLM.
no code implementations • 12 Oct 2023 • Zijie Wu, Chaohui Yu, Zhen Zhu, Fan Wang, Xiang Bai
To utilize the abundant visual priors in the off-the-shelf T2I models, a series of methods try to invert an image to proper embedding that aligns with the semantic space of the T2I model.
1 code implementation • 11 Oct 2023 • Yuxuan Cai, Dingkang Liang, Dongliang Luo, Xinwei He, Xin Yang, Xiang Bai
To alleviate this issue, we present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies across different anomaly detection benchmarks.
no code implementations • 5 Sep 2023 • Xin Zhou, Jinghua Hou, Tingting Yao, Dingkang Liang, Zhe Liu, Zhikang Zou, Xiaoqing Ye, Jianwei Cheng, Xiang Bai
3D object detection is an essential task for achieving autonomous driving.
no code implementations • 21 Aug 2023 • Zhuang Liu, Ye Yuan, Zhilong Ji, Jingfeng Bai, Xiang Bai
Then we design a semantic aware module (SAM), which projects the visual and classification feature into semantic space.
1 code implementation • 21 Aug 2023 • Wenwen Yu, Yuliang Liu, Xingkui Zhu, Haoyu Cao, Xing Sun, Xiang Bai
Utilizing only 10% of the supervised data, FastTCM-CR50 improves performance by an average of 26. 5% and 5. 5% for text detection and spotting tasks, respectively.
3 code implementations • ICCV 2023 • Mingxin Huang, Jiaxin Zhang, Dezhi Peng, Hao Lu, Can Huang, Yuliang Liu, Xiang Bai, Lianwen Jin
To this end, we introduce a new model named Explicit Synergy-based Text Spotting Transformer framework (ESTextSpotter), which achieves explicit synergy by modeling discriminative and interactive features for text detection and recognition within a single decoder.
1 code implementation • 8 Jun 2023 • Zelin Liu, Xinggang Wang, Cheng Wang, Wenyu Liu, Xiang Bai
By integrating the pseudo-depth method and the DCM strategy into the data association process, we propose a new tracker, called SparseTrack.
Ranked #1 on
Multi-Object Tracking
on MOT20
(using extra training data)
1 code implementation • 6 Jun 2023 • Wenwen Yu, MingYu Liu, Biao Yang, Enming Zhang, Deqiang Jiang, Xing Sun, Yuliang Liu, Xiang Bai
Text recognition in the wild is a long-standing problem in computer vision.
no code implementations • 5 Jun 2023 • Wenwen Yu, Chengquan Zhang, Haoyu Cao, Wei Hua, Bohan Li, Huang Chen, MingYu Liu, Mingrui Chen, Jianfeng Kuang, Mengjun Cheng, Yuning Du, Shikun Feng, Xiaoguang Hu, Pengyuan Lyu, Kun Yao, Yuechen Yu, Yuliang Liu, Wanxiang Che, Errui Ding, Cheng-Lin Liu, Jiebo Luo, Shuicheng Yan, Min Zhang, Dimosthenis Karatzas, Xing Sun, Jingdong Wang, Xiang Bai
It is hoped that this competition will attract many researchers in the field of CV and NLP, and bring some new thoughts to the field of Document AI.
1 code implementation • 4 Jun 2023 • Dingyuan Zhang, Dingkang Liang, Hongcheng Yang, Zhikang Zou, Xiaoqing Ye, Zhe Liu, Xiang Bai
In the spirit of unleashing the capability of foundation models on vision tasks, the Segment Anything Model (SAM), a vision foundation model for image segmentation, has been proposed recently and presents strong zero-shot ability on many downstream 2D tasks.
1 code implementation • 13 May 2023 • Yuliang Liu, Zhang Li, Hongliang Li, Wenwen Yu, Yang Liu, Biao Yang, Mingxin Huang, Dezhi Peng, MingYu Liu, Mingrui Chen, Chunyuan Li, XuCheng Yin, Cheng-Lin Liu, Lianwen Jin, Xiang Bai
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning.
1 code implementation • 12 May 2023 • Zhe Liu, Xiaoqing Ye, Zhikang Zou, Xinwei He, Xiao Tan, Errui Ding, Jingdong Wang, Xiang Bai
Extensive experiments on the nuScenes dataset demonstrate that our method is much more stable in dealing with challenging cases such as asynchronous sensors, misaligned sensor placement, and degenerated camera images than existing fusion methods.
Ranked #47 on
3D Object Detection
on nuScenes
1 code implementation • 12 May 2023 • Jianfeng Kuang, Wei Hua, Dingkang Liang, Mingkun Yang, Deqiang Jiang, Bo Ren, Xiang Bai
We evaluate the existing end-to-end methods for VIE on the proposed dataset and observe that the performance of these methods has a distinguishable drop from SROIE (a widely used English dataset) to our proposed dataset due to the larger variance of layout and entities.
1 code implementation • 5 May 2023 • Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Hong Zhou, Mike Zheng Shou, Xiang Bai
Most existing cross-modal language-to-video retrieval (VR) research focuses on single-modal input from video, i. e., visual representation, while the text is omnipresent in human environments and frequently critical to understand video.
no code implementations • 24 Apr 2023 • Wenwen Yu, MingYu Liu, Mingrui Chen, Ning Lu, Yinlong Wen, Yuliang Liu, Dimosthenis Karatzas, Xiang Bai
To promote research in this area, we organized ICDAR 2023 competition on reading the seal title (ReST), which included two tasks: seal title text detection (Task 1) and end-to-end seal title recognition (Task 2).
1 code implementation • CVPR 2023 • Wei Hua, Dingkang Liang, Jingyu Li, Xiaolong Liu, Zhikang Zou, Xiaoqing Ye, Xiang Bai
Semi-Supervised Object Detection (SSOD), aiming to explore unlabeled data for boosting object detectors, has become an active task in recent years.
no code implementations • 10 Apr 2023 • Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Mike Zheng Shou, Umapada Pal, Dimosthenis Karatzas, Xiang Bai
In this competition report, we establish a video text reading benchmark, DSText, which focuses on dense and small text reading challenges in the video with various scenarios.
2 code implementations • CVPR 2023 • Dingkang Liang, Jiahao Xie, Zhikang Zou, Xiaoqing Ye, Wei Xu, Xiang Bai
To the best of our knowledge, CrowdCLIP is the first to investigate the vision language knowledge to solve the counting problem.
Ranked #1 on
Cross-Part Crowd Counting
on ShanghaiTech B
no code implementations • CVPR 2023 • Zhibo Yang, Rujiao Long, Pengfei Wang, Sibo Song, Humen Zhong, Wenqing Cheng, Xiang Bai, Cong Yao
As the first contribution of this work, we curate and release a new dataset for VIE, in which the document images are much more challenging in that they are taken from real applications, and difficulties such as blur, partial occlusion, and printing shift are quite common.
2 code implementations • CVPR 2023 • Kaixin Xiong, Shi Gong, Xiaoqing Ye, Xiao Tan, Ji Wan, Errui Ding, Jingdong Wang, Xiang Bai
In this paper, we address the problem of detecting 3D objects from multi-view images.
Ranked #7 on
3D Object Detection
on nuScenes Camera Only
1 code implementation • CVPR 2023 • Qihao Liu, Junfeng Wu, Yi Jiang, Xiang Bai, Alan Yuille, Song Bai
A common solution is to use optical flow to provide motion information, but essentially it only considers pixel-level motion, which still relies on appearance similarity and hence is often inaccurate under occlusion and fast movement.
1 code implementation • CVPR 2023 • Wenwen Yu, Yuliang Liu, Wei Hua, Deqiang Jiang, Bo Ren, Xiang Bai
Recently, pretraining approaches based on vision language models have made effective progresses in the field of text detection.
3 code implementations • CVPR 2023 • Mengde Xu, Zheng Zhang, Fangyun Wei, Han Hu, Xiang Bai
A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias which is applied in the CLIP model to recognize the class of masks.
no code implementations • 4 Jan 2023 • Zhe Liu, Xiaoqing Ye, Xiao Tan, Errui Ding, Xiang Bai
In this paper, we propose a cross-modal distillation method named StereoDistill to narrow the gap between the stereo and LiDAR-based approaches via distilling the stereo detectors from the superior LiDAR model at the response level, which is usually overlooked in 3D object detection distillation.
3 code implementations • 4 Jan 2023 • Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin
Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations.
Ranked #14 on
Text Spotting
on ICDAR 2015
no code implementations • ICCV 2023 • Dingyuan Zhang, Dingkang Liang, Zhikang Zou, Jingyu Li, Xiaoqing Ye, Zhe Liu, Xiao Tan, Xiang Bai
Advanced 3D object detection methods usually rely on large-scale, elaborately labeled datasets to achieve good performance.
no code implementations • 18 Nov 2022 • Junfeng Wu, Yi Jiang, Qihao Liu, Xiang Bai, Song Bai
This technical report describes our 2nd-place solution for the ECCV 2022 YouTube-VIS Long Video Challenge.
1 code implementation • 12 Nov 2022 • Tianyi Shi, Xiaohuan Ding, Wei Zhou, Feng Pan, Zengqiang Yan, Xiang Bai, Xin Yang
Vessel segmentation is crucial in many medical image applications, such as detecting coronary stenoses, retinal vessel diseases and brain aneurysms.
no code implementations • 27 Sep 2022 • HUI ZHANG, Quanming Yao, James T. Kwok, Xiang Bai
We design a domain-specific search space by exploring principles for having good feature extractors.
Neural Architecture Search
Vocal Bursts Intensity Prediction
1 code implementation • 31 Jul 2022 • Xudong Xie, Ling Fu, Zhifei Zhang, Zhaowen Wang, Xiang Bai
Thirdly, we utilize Transformer to learn the global feature on image-level and model the global relationship of the corner points, with the assistance of a corner-query cross-attention mechanism.
no code implementations • 25 Jul 2022 • Jingqun Tang, Wenming Qian, Luchuan Song, Xiena Dong, Lan Li, Xiang Bai
Text detection and recognition are essential components of a modern OCR system.
1 code implementation • 23 Jul 2022 • Bohan Li, Ye Yuan, Dingkang Liang, Xiao Liu, Zhilong Ji, Jinfeng Bai, Wenyu Liu, Xiang Bai
Recently, most handwritten mathematical expression recognition (HMER) methods adopt the encoder-decoder networks, which directly predict the markup sequences from formula images with the attention mechanism.
1 code implementation • 21 Jul 2022 • Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
Ranked #6 on
Video Instance Segmentation
on YouTube-VIS validation
(using extra training data)
1 code implementation • 11 Jul 2022 • Zijie Wu, Zhen Zhu, Junping Du, Xiang Bai
CCPL can preserve the coherence of the content source during style transfer without degrading stylization.
1 code implementation • 1 Jul 2022 • Mingkun Yang, Minghui Liao, Pu Lu, Jing Wang, Shenggao Zhu, Hualin Luo, Qi Tian, Xiang Bai
Inspired by the observation that humans learn to recognize the texts through both reading and writing, we propose to learn discrimination and generation by integrating contrastive learning and masked image modeling in our self-supervised method.
2 code implementations • CVPR 2022 • Sibo Song, Jianqiang Wan, Zhibo Yang, Jun Tang, Wenqing Cheng, Xiang Bai, Cong Yao
In this paper, we specifically adapt vision-language joint learning for scene text detection, a task that intrinsically involves cross-modal interaction between the two modalities: vision and language, since text is the written form of language.
no code implementations • 16 Apr 2022 • Shi Gong, Xiaoqing Ye, Xiao Tan, Jingdong Wang, Errui Ding, Yu Zhou, Xiang Bai
Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving for its powerful spatial representation ability.
1 code implementation • CVPR 2022 • Xiaolong Liu, Song Bai, Xiang Bai
Rather than end-to-end learning, most existing methods adopt a head-only learning paradigm, where the video encoder is pre-trained for action classification, and only the detection head upon the encoder is optimized for TAD.
Ranked #13 on
Temporal Action Localization
on THUMOS’14
no code implementations • CVPR 2022 • Jingqun Tang, Wenqing Zhang, Hongye Liu, Mingkun Yang, Bo Jiang, Guanglong Hu, Xiang Bai
Different from previous approaches that learn robust deep representations of scene text in a holistic manner, our method performs scene text detection based on a few representative features, which avoids the disturbance by background and reduces the computational cost.
Ranked #17 on
Object Detection In Aerial Images
on DOTA
(using extra training data)
1 code implementation • CVPR 2022 • Hao Wang, Junchao Liao, Tianheng Cheng, Zewen Gao, Hao liu, Bo Ren, Xiang Bai, Wenyu Liu
Recently, the semantics of scene text has been proven to be essential in fine-grained image classification.
no code implementations • 23 Mar 2022 • Wondimu Dikubab, Dingkang Liang, Minghui Liao, Xiang Bai
Ethiopic/Amharic script is one of the oldest African writing systems, which serves at least 23 languages (e. g., Amharic, Tigrinya) in East Africa for more than 120 million people.
2 code implementations • CVPR 2022 • Ye Yuan, Xiao Liu, Wondimu Dikubab, Hui Liu, Zhilong Ji, Zhongqin Wu, Xiang Bai
In this paper, we propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network.
1 code implementation • 26 Feb 2022 • Dingkang Liang, Wei Xu, Xiang Bai
Crowd localization, predicting head positions, is a more practical and high-level task than simply counting.
5 code implementations • 21 Feb 2022 • Minghui Liao, Zhisheng Zou, Zhaoyi Wan, Cong Yao, Xiang Bai
By incorporating the proposed DB and ASF with the segmentation network, our proposed scene text detector consistently achieves state-of-the-art results, in terms of both detection accuracy and speed, on five standard benchmarks.
Ranked #3 on
Scene Text Detection
on MSRA-TD500
2 code implementations • 29 Dec 2021 • Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Han Hu, Xiang Bai
However, semantic segmentation and the CLIP model perform on different visual granularity, that semantic segmentation processes on pixels while CLIP performs on images.
Ranked #2 on
Open Vocabulary Semantic Segmentation
on Cityscapes
1 code implementation • 21 Dec 2021 • Zhe Liu, Tengteng Huang, Bingling Li, Xiwu Chen, Xi Wang, Xiang Bai
Recently, fusing the LiDAR point cloud and camera image to improve the performance and robustness of 3D object detection has received more and more attention, as these two modalities naturally possess strong complementarity.
1 code implementation • 15 Dec 2021 • Dezhi Peng, Xinyu Wang, Yuliang Liu, Jiaxin Zhang, Mingxin Huang, Songxuan Lai, Shenggao Zhu, Jing Li, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin
For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance.
Ranked #2 on
Text Spotting
on SCUT-CTW1500
2 code implementations • 15 Dec 2021 • Junfeng Wu, Yi Jiang, Song Bai, Wenqing Zhang, Xiang Bai
Nevertheless, we observe that a stand-alone instance query suffices for capturing a time sequence of instances in a video, but attention mechanisms shall be done with each frame independently.
Ranked #2 on
Video Instance Segmentation
on HQ-YTVIS
1 code implementation • 9 Dec 2021 • Silin Cheng, Xiwu Chen, Xinwei He, Zhe Liu, Xiang Bai
Learning intra-region contexts and inter-region relations are two effective strategies to strengthen feature representations for point cloud analysis.
Ranked #37 on
3D Point Cloud Classification
on ModelNet40
1 code implementation • 18 Nov 2021 • Xiang Bai, Hanchen Wang, Liya Ma, Yongchao Xu, Jiefeng Gan, Ziwei Fan, Fan Yang, Ke Ma, Jiehua Yang, Song Bai, Chang Shu, Xinyu Zou, Renhao Huang, Changzheng Zhang, Xiaowu Liu, Dandan Tu, Chuou Xu, Wenqing Zhang, Xi Wang, Anguo Chen, Yu Zeng, Dehua Yang, Ming-Wei Wang, Nagaraj Holalkere, Neil J. Halin, Ihab R. Kamel, Jia Wu, Xuehua Peng, Xiang Wang, Jianbo Shao, Pattanasak Mongkolwat, Jianjun Zhang, Weiyang Liu, Michael Roberts, Zhongzhao Teng, Lucian Beer, Lorena Escudero Sanchez, Evis Sala, Daniel Rubin, Adrian Weller, Joan Lasenby, Chuangsheng Zheng, Jianming Wang, Zhen Li, Carola-Bibiane Schönlieb, Tian Xia
Artificial intelligence (AI) provides a promising substitution for streamlining COVID-19 diagnoses.
no code implementations • 15 Nov 2021 • Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai
To promote the development of occlusion understanding, we collect a large-scale dataset called OVIS for video instance segmentation in the occluded scenario.
1 code implementation • 9 Nov 2021 • Yuzhe Gao, Xing Li, Jiajian Zhang, Yu Zhou, Dian Jin, Jing Wang, Shenggao Zhu, Xiang Bai
We leverage a Siamese ComplementaryModule to fully exploit the continuity characteristic of the textinstances in the temporal dimension, which effectively alleviatesthe missed detection of the text instances, and hence ensuresthe completeness of each text trajectory.
1 code implementation • NeurIPS 2021 • Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Stephen Lin, Han Hu, Xiang Bai
We introduce MixTraining, a new training paradigm for object detection that can improve the performance of existing detectors for free.
1 code implementation • 30 Aug 2021 • Gui-Song Xia, Jian Ding, Ming Qian, Nan Xue, Jiaming Han, Xiang Bai, Michael Ying Yang, Shengyang Li, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, Liangpei Zhang, Qiang Zhou, Chao-hui Yu, Kaixuan Hu, Yingjia Bu, Wenming Tan, Zhe Yang, Wei Li, Shang Liu, Jiaxuan Zhao, Tianzhi Ma, Zi-han Gao, Lingqi Wang, Yi Zuo, Licheng Jiao, Chang Meng, Hao Wang, Jiahao Wang, Yiming Hui, Zhuojun Dong, Jie Zhang, Qianyue Bao, Zixiao Zhang, Fang Liu
This report summarizes the results of Learning to Understand Aerial Images (LUAI) 2021 challenge held on ICCV 2021, which focuses on object detection and semantic segmentation in aerial images.
5 code implementations • 25 Aug 2021 • Dong Wu, Manwen Liao, Weitian Zhang, Xinggang Wang, Xiang Bai, Wenqing Cheng, Wenyu Liu
A panoptic driving perception system is an essential part of autonomous driving.
Ranked #2 on
Drivable Area Detection
on BDD100K val
1 code implementation • 19 Jul 2021 • Dawei Du, Longyin Wen, Pengfei Zhu, Heng Fan, QinGhua Hu, Haibin Ling, Mubarak Shah, Junwen Pan, Ali Al-Ali, Amr Mohamed, Bakour Imene, Bin Dong, Binyu Zhang, Bouchali Hadia Nesma, Chenfeng Xu, Chenzhen Duan, Ciro Castiello, Corrado Mencar, Dingkang Liang, Florian Krüger, Gennaro Vessio, Giovanna Castellano, Jieru Wang, Junyu Gao, Khalid Abualsaud, Laihui Ding, Lei Zhao, Marco Cianciotta, Muhammad Saqib, Noor Almaadeed, Omar Elharrouss, Pei Lyu, Qi Wang, Shidong Liu, Shuang Qiu, Siyang Pan, Somaya Al-Maadeed, Sultan Daud Khan, Tamer Khattab, Tao Han, Thomas Golda, Wei Xu, Xiang Bai, Xiaoqing Xu, Xuelong Li, Yanyun Zhao, Ye Tian, Yingnan Lin, Yongchao Xu, Yuehan Yao, Zhenyu Xu, Zhijian Zhao, Zhipeng Luo, Zhiwei Wei, Zhiyuan Zhao
Crowd counting on the drone platform is an interesting topic in computer vision, which brings new challenges such as small object inference, background clutter and wide viewpoint.
no code implementations • CVPR 2021 • Jing Wang, Jinhui Tang, Mingkun Yang, Xiang Bai, Jiebo Luo
Under the guidance of the geometrical relationship between OCR tokens, our LSTM-R capitalizes on a newly-devised relation-aware pointer network to select OCR tokens from the scene text for OCR-based image captioning.
1 code implementation • 18 Jun 2021 • Xiaolong Liu, Qimeng Wang, Yao Hu, Xu Tang, Shiwei Zhang, Song Bai, Xiang Bai
Temporal action detection (TAD) aims to determine the semantic label and the temporal interval of every action instance in an untrimmed video.
Ranked #5 on
Temporal Action Localization
on HACS
8 code implementations • ICCV 2021 • Mengde Xu, Zheng Zhang, Han Hu, JianFeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai, Zicheng Liu
This paper presents an end-to-end semi-supervised object detection approach, in contrast to previous more complex multi-stage methods.
Ranked #5 on
Semi-Supervised Object Detection
on COCO 100% labeled data
(using extra training data)
1 code implementation • 19 Apr 2021 • Dingkang Liang, Xiwu Chen, Wei Xu, Yu Zhou, Xiang Bai
Current weakly-supervised counting methods adopt the CNN to regress a total count of the crowd by an image-to-count paradigm.
1 code implementation • CVPR 2021 • Hao Wang, Xiang Bai, Mingkun Yang, Shenggao Zhu, Jing Wang, Wenyu Liu
Such a task is usually realized by matching a query text to the recognized words, outputted by an end-to-end scene text spotter.
no code implementations • CVPR 2021 • Minghang He, Minghui Liao, Zhibo Yang, Humen Zhong, Jun Tang, Wenqing Cheng, Cong Yao, Yongpan Wang, Xiang Bai
Over the past few years, the field of scene text detection has progressed rapidly that modern text detectors are able to hunt text in various challenging scenarios.
1 code implementation • 22 Mar 2021 • Zhen Zhu, Tengteng Huang, Mengde Xu, Baoguang Shi, Wenqing Cheng, Xiang Bai
This paper proposes a new generative adversarial network for pose transfer, i. e., transferring the pose of a given person to a target pose.
1 code implementation • 18 Mar 2021 • Zheng Huang, Kai Chen, Jianhua He, Xiang Bai, Dimosthenis Karatzas, Shjian Lu, C. V. Jawahar
In this competition, we set up three tasks, namely, Scanned Receipt Text Localisation (Task 1), Scanned Receipt OCR (Task 2) and Key Information Extraction from Scanned Receipts (Task 3).
Key Information Extraction
Optical Character Recognition (OCR)
2 code implementations • 24 Feb 2021 • Jian Ding, Nan Xue, Gui-Song Xia, Xiang Bai, Wen Yang, Micheal Ying Yang, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, Liangpei Zhang
In this paper, we present a large-scale Dataset of Object deTection in Aerial images (DOTA) and comprehensive baselines for ODAI.
no code implementations • 23 Feb 2021 • Zhiliang Xu, Xiyu Yu, Zhibin Hong, Zhen Zhu, Junyu Han, Jingtuo Liu, Errui Ding, Xiang Bai
By simply employing some existing and easy-obtainable prior information, our method can control, transfer, and edit diverse attributes of faces in the wild.
Ranked #1 on
Face Swapping
on FaceForensics++
(FID metric)
2 code implementations • 2 Feb 2021 • Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai
On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 16. 3, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario.
Ranked #30 on
Video Instance Segmentation
on OVIS validation
1 code implementation • CVPR 2021 • Xiaolong Liu, Yao Hu, Song Bai, Fei Ding, Xiang Bai, Philip H. S. Torr
Current developments in temporal event or action localization usually target actions captured by a single camera.
Ranked #2 on
Temporal Action Localization
on MUSES
1 code implementation • 14 Dec 2020 • Yang Liu, Zhen Zhu, Xiang Bai
Visible watermarks are widely-used in images to protect copyright ownership.
no code implementations • 9 Dec 2020 • Wenqing Zhang, Yang Qiu, Minghui Liao, Rui Zhang, Xiaolin Wei, Xiang Bai
It is a general labeling method for texts with various shapes and requires low labeling costs.
1 code implementation • 26 Sep 2020 • Wei Zhou, Yukang Wang, Jiajia Chu, Jiehua Yang, Xiang Bai, Yongchao Xu
Specifically, we perform domain adaptation on the affinity relationship between adjacent pixels termed affinity space of source and target domain.
no code implementations • 22 Jul 2020 • Wenqing Zhang, Yang Qiu, Song Bai, Rui Zhang, Xiaolin Wei, Xiang Bai
In this paper, we study how to make use of decentralized datasets for training a robust scene text recognizer while keeping them stay on local devices.
1 code implementation • ECCV 2020 • Minghui Liao, Guan Pang, Jing Huang, Tal Hassner, Xiang Bai
Recent end-to-end trainable methods for scene text spotting, integrating detection and recognition, showed much progress.
Ranked #10 on
Text Spotting
on Total-Text
1 code implementation • ECCV 2020 • Tengteng Huang, Zhe Liu, Xiwu Chen, Xiang Bai
In this paper, we aim at addressing two critical issues in the 3D detection task, including the exploitation of multiple sensors~(namely LiDAR point cloud and camera image), as well as the inconsistency between the localization and classification confidence.
no code implementations • 9 Jul 2020 • Changxu Cheng, Wuheng Xu, Xiang Bai, Bin Feng, Wenyu Liu
Chinese text recognition is more challenging than Latin text due to the large amount of fine-grained Chinese characters and the great imbalance over classes, which causes a serious overfitting problem.
1 code implementation • CVPR 2020 • Jianqiang Wan, Yang Liu, Donglai Wei, Xiang Bai, Yongchao Xu
In this paper, we propose a fast image segmentation method based on a novel super boundary-to-pixel direction (super-BPD) and a customized segmentation algorithm with super-BPD.
3 code implementations • ECCV 2020 • Wenjia Wang, Enze Xie, Xuebo Liu, Wenhai Wang, Ding Liang, Chunhua Shen, Xiang Bai
For example, it outperforms LapSRN by over 5% and 8%on the recognition accuracy of ASTER and CRNN.
1 code implementation • CVPR 2020 • Zhen Zhu, Zhiliang Xu, Ansheng You, Xiang Bai
Experiments on several challenging datasets demonstrate the superiority of GroupDNet on performing the SMIS task.
1 code implementation • ECCV 2020 • Hui Zhang, Quanming Yao, Mingkun Yang, Yongchao Xu, Xiang Bai
In this work, inspired by the success of neural architecture search (NAS), which can identify better architectures than human-designed ones, we propose automated STR (AutoSTR) to search data-dependent backbones to boost text recognition performance.
no code implementations • 12 Feb 2020 • Hao Chen, Yali Wang, Guoyou Wang, Xiang Bai, Yu Qiao
Inspired by this procedure of learning to detect, we propose a novel Progressive Object Transfer Detection (POTD) framework.
no code implementations • 28 Dec 2019 • Zhaoyi Wan, Minghang He, Haoran Chen, Xiang Bai, Cong Yao
Driven by deep learning and the large volume of data, scene text recognition has evolved rapidly in recent years.
Ranked #15 on
Scene Text Recognition
on ICDAR2015
2 code implementations • 20 Dec 2019 • Chenfeng Xu, Dingkang Liang, Yongchao Xu, Song Bai, Wei Zhan, Xiang Bai, Masayoshi Tomizuka
A major issue is that the density map on dense regions usually accumulates density values from a number of nearby Gaussian blobs, yielding different large density values on a small set of pixels.
no code implementations • 20 Dec 2019 • Xi Liu, Rui Zhang, Yongsheng Zhou, Qianyi Jiang, Qi Song, Nan Li, Kai Zhou, Lei Wang, Dong Wang, Minghui Liao, Mingkun Yang, Xiang Bai, Baoguang Shi, Dimosthenis Karatzas, Shijian Lu, C. V. Jawahar
21 teams submit results for Task 1, 23 teams submit results for Task 2, 24 teams submit results for Task 3, and 13 teams submit results for Task 4.
1 code implementation • 11 Dec 2019 • Zhe Liu, Xin Zhao, Tengteng Huang, Ruolan Hu, Yu Zhou, Xiang Bai
In this paper, we focus on exploring the robustness of the 3D object detection in point clouds, which has been rarely discussed in existing approaches.
no code implementations • 9 Dec 2019 • Changxu Cheng, Qiuhui Huang, Xiang Bai, Bin Feng, Wenyu Liu
Script identification in the wild is of great importance in a multi-lingual robust-reading system.
no code implementations • 21 Nov 2019 • Hao Wang, Pu Lu, HUI ZHANG, Mingkun Yang, Xiang Bai, Yongchao Xu, Mengchao He, Yongpan Wang, Wenyu Liu
Recently, end-to-end text spotting that aims to detect and recognize text from cluttered images simultaneously has received particularly growing interest in computer vision.
1 code implementation • 21 Nov 2019 • Yongchao Xu, Mingtao Fu, Qimeng Wang, Yukang Wang, Kai Chen, Gui-Song Xia, Xiang Bai
Yet, the widely adopted horizontal bounding box representation is not appropriate for ubiquitous oriented objects such as objects in aerial images and scene texts.
Ranked #38 on
Object Detection In Aerial Images
on DOTA
(using extra training data)
15 code implementations • 20 Nov 2019 • Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai
Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text.
Ranked #6 on
Scene Text Detection
on MSRA-TD500
7 code implementations • 7 Oct 2019 • Ning Lu, Wenwen Yu, Xianbiao Qi, Yihao Chen, Ping Gong, Rong Xiao, Xiang Bai
Attention-based scene text recognizers have gained huge success, which leverages a more compact intermediate representation to learn 1d- or 2d- attention by a RNN-based encoder-decoder architecture.
1 code implementation • ECCV 2018 • Minghui Liao, Pengyuan Lyu, Minghang He, Cong Yao, Wenhao Wu, Xiang Bai
Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.
5 code implementations • ICCV 2019 • Zhen Zhu, Mengde Xu, Song Bai, Tengteng Huang, Xiang Bai
The non-local module works as a particularly useful technique for semantic segmentation while criticized for its prohibitive computation and GPU memory occupation.
Ranked #15 on
Semantic Segmentation
on COCO-Stuff test
2 code implementations • 8 Aug 2019 • Liang Wu, Chengquan Zhang, Jiaming Liu, Junyu Han, Jingtuo Liu, Errui Ding, Xiang Bai
Specifically, we propose an end-to-end trainable style retention network (SRNet) that consists of three modules: text conversion module, background inpainting module and fusion module.
no code implementations • ICCV 2019 • Xinwei He, Tengteng Huang, Song Bai, Xiang Bai
By doing so, spatial information across multiple views is captured, which helps to learn a discriminative global embedding for each 3D object.
no code implementations • ICCV 2019 • MingKun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, Xiang Bai
Reading text in the wild is a very challenging task due to the diversity of text instances and the complexity of natural scenes.
no code implementations • ICCV 2019 • Chenfeng Xu, Kai Qiu, Jianlong Fu, Song Bai, Yongchao Xu, Xiang Bai
Dense crowd counting aims to predict thousands of human instances from an image, by calculating integrals of a density map over image pixels.
no code implementations • 23 Jul 2019 • Zhaoyi Wan, Fengming Xie, Yibo Liu, Xiang Bai, Cong Yao
Scene text recognition has been an important, active research topic in computer vision for years.
1 code implementation • 13 Jul 2019 • Minghui Liao, Boyu Song, Shangbang Long, Minghang He, Cong Yao, Xiang Bai
Different from the previous methods which paste the rendered text on static 2D images, our method can render the 3D virtual scene and text instances as an entirety.
3 code implementations • 30 May 2019 • Syed Waqas Zamir, Aditya Arora, Akshita Gupta, Salman Khan, Guolei Sun, Fahad Shahbaz Khan, Fan Zhu, Ling Shao, Gui-Song Xia, Xiang Bai
Compared to existing small-scale aerial image based instance segmentation datasets, iSAID contains 15$\times$ the number of object categories and 5$\times$ the number of instances.
Ranked #1 on
Object Detection
on iSAID
2 code implementations • CVPR 2019 • Zhen Zhu, Tengteng Huang, Baoguang Shi, Miao Yu, Bofei Wang, Xiang Bai
This paper proposes a new generative adversarial network for pose transfer, i. e., transferring the pose of a given person to a target pose.
Ranked #1 on
Pose Transfer
on Market-1501
1 code implementation • 4 Dec 2018 • Yongchao Xu, Yukang Wang, Wei Zhou, Yongpan Wang, Zhibo Yang, Xiang Bai
Experimental results show that the proposed TextField outperforms the state-of-the-art methods by a large margin (28% and 8%) on two curved text datasets: Total-Text and CTW1500, respectively, and also achieves very competitive performance on multi-oriented datasets: ICDAR 2015 and MSRA-TD500.
Ranked #22 on
Scene Text Detection
on Total-Text
2 code implementations • CVPR 2019 • Yukang Wang, Yongchao Xu, Stavros Tsogkas, Xiang Bai, Sven Dickinson, Kaleem Siddiqi
In the present article, we depart from this strategy by training a CNN to predict a two-dimensional vector field, which maps each scene point to a candidate skeleton pixel, in the spirit of flux-based skeletonization algorithms.
Ranked #1 on
Object Skeleton Detection
on SK-LARGE
no code implementations • 18 Sep 2018 • Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang, Pengyuan Lyu, Cong Yao, Xiang Bai
Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recognition as a sequence prediction problem.
Ranked #27 on
Scene Text Recognition
on SVT
1 code implementation • ECCV 2018 • Rui Yu, Zhiyong Dou, Song Bai, Zhao-Xiang Zhang, Yongchao Xu, Xiang Bai
Person re-identification (re-ID) is a highly challenging task due to large variations of pose, viewpoint, illumination, and occlusion.
no code implementations • ECCV 2018 • Fu-Dong Wang, Nan Xue, Yi-Peng Zhang, Xiang Bai, Gui-Song Xia
Due to an efficient Frank-Wolfe method-based optimization strategy, we can handle graphs with hundreds and thousands of nodes within an acceptable amount of time.
4 code implementations • 9 Jul 2018 • Peng Tang, Xinggang Wang, Song Bai, Wei Shen, Xiang Bai, Wenyu Liu, Alan Yuille
The iterative instance classifier refinement is implemented online using multiple streams in convolutional neural networks, where the first is an MIL network and the others are for instance classifier refinement supervised by the preceding one.
Ranked #2 on
Weakly Supervised Object Detection
on HICO-DET
1 code implementation • ECCV 2018 • Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, Xiang Bai
Recently, models based on deep neural networks have dominated the fields of scene text detection and recognition.
Ranked #3 on
Scene Text Detection
on ICDAR 2013 (1015)
1 code implementation • 29 Jun 2018 • Yang Xiao, Jun Chen, Yancheng Wang, Zhiguo Cao, Joey Tianyi Zhou, Xiang Bai
To better exploit three-dimensional (3D) characteristics, multi-view dynamic images are proposed.
1 code implementation • 11 May 2018 • Yang Zhou, Zhen Zhu, Xiang Bai, Dani Lischinski, Daniel Cohen-Or, Hui Huang
We demonstrate that this conceptually simple approach is highly effective for capturing large-scale structures, as well as other non-stationary attributes of the input exemplar.
1 code implementation • CVPR 2018 • Xinwei He, Yang Zhou, Zhichao Zhou, Song Bai, Xiang Bai
Most existing 3D object recognition algorithms focus on leveraging the strong discriminative power of deep learning models with softmax loss for the classification of 3D data, while learning discriminative features with deep metric learning for 3D object retrieval is more or less neglected.
no code implementations • CVPR 2018 • Minghui Liao, Zhen Zhu, Baoguang Shi, Gui-Song Xia, Xiang Bai
Previous methods rely on shared features for both tasks, resulting in degraded performance due to the incompatibility of the two tasks.
Ranked #14 on
Scene Text Detection
on MSRA-TD500
1 code implementation • CVPR 2018 • Pengyuan Lyu, Cong Yao, Wenhao Wu, Shuicheng Yan, Xiang Bai
We propose to detect scene text by localizing corner points of text bounding boxes and segmenting text regions in relative positions.
Ranked #2 on
Scene Text Detection
on ICDAR 2017 MLT
3 code implementations • 9 Jan 2018 • Minghui Liao, Baoguang Shi, Xiang Bai
In this paper, we present an end-to-end trainable fast scene text detector, named TextBoxes++, which detects arbitrary-oriented scene text with both high accuracy and efficiency in a single network forward pass.
Ranked #2 on
Scene Text Detection
on COCO-Text
1 code implementation • 29 Nov 2017 • Xiang Bai, Mingkun Yang, Tengteng Huang, Zhiyong Dou, Rui Yu, Yongchao Xu
Recently, many methods of person re-identification (Re-ID) rely on part-based feature representation to learn a discriminative pedestrian descriptor.
6 code implementations • CVPR 2018 • Gui-Song Xia, Xiang Bai, Jian Ding, Zhen Zhu, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, Liangpei Zhang
The fully annotated DOTA images contains $188, 282$ instances, each of which is labeled by an arbitrary (8 d. o. f.)
Ranked #48 on
Object Detection In Aerial Images
on DOTA
(using extra training data)
no code implementations • ICCV 2017 • Song Bai, Zhichao Zhou, Jingdong Wang, Xiang Bai, Longin Jan Latecki, Qi Tian
This stimulates a great research interest of considering similarity fusion in the framework of diffusion process (i. e., fusion with diffusion) for robust retrieval.
5 code implementations • 31 Aug 2017 • Baoguang Shi, Cong Yao, Minghui Liao, Mingkun Yang, Pei Xu, Linyan Cui, Serge Belongie, Shijian Lu, Xiang Bai
This report introduces RCTW, a new competition that focuses on Chinese text reading.
no code implementations • 11 Aug 2017 • Rui Yu, Zhichao Zhou, Song Bai, Xiang Bai
Finally, the new features from the same image are fused into one vector for re-ranking.
no code implementations • 27 Jun 2017 • Pengyuan Lyu, Xiang Bai, Cong Yao, Zhen Zhu, Tengteng Huang, Wenyu Liu
In this paper, we investigate the Chinese calligraphy synthesis problem: synthesizing Chinese calligraphy images with specified style from standard font(eg.
1 code implementation • 6 May 2017 • Peng Tang, Xinggang Wang, Zilong Huang, Xiang Bai, Wenyu Liu
Patch-level image representation is very important for object classification and detection, since it is robust to spatial transformation, scale variation, and cluttered background.
no code implementations • 15 Apr 2017 • Xiang Bai, Mingkun Yang, Pengyuan Lyu, Yongchao Xu, Jiebo Luo
Then, we combine the word embedding of the recognized words and the deep visual features into a single representation, which is optimized by a convolutional neural network for fine-grained image classification.
4 code implementations • CVPR 2017 • Peng Tang, Xinggang Wang, Xiang Bai, Wenyu Liu
We propose a novel online instance classifier refinement algorithm to integrate MIL and the instance classifier refinement procedure into a single deep network, and train the network end-to-end with only image-level supervision, i. e., without object location information.
no code implementations • CVPR 2017 • Song Bai, Xiang Bai, Qi Tian
Most existing person re-identification algorithms either extract robust visual features or learn discriminative metrics for person images.
Ranked #98 on
Person Re-Identification
on Market-1501
6 code implementations • CVPR 2017 • Baoguang Shi, Xiang Bai, Serge Belongie
It achieves an f-measure of 75. 0% on the standard ICDAR 2015 Incidental (Challenge 4) benchmark, outperforming the previous best by a large margin.
Ranked #11 on
Scene Text Detection
on ICDAR 2013 (1015)
no code implementations • 16 Mar 2017 • Nan Xue, Gui-Song Xia, Xiang Bai, Liangpei Zhang, Weiming Shen
This paper presents a novel approach to junction detection and characterization that exploits the locally anisotropic geometries of a junction and estimates the scales of these geometries using an \emph{a contrario} model.
no code implementations • 25 Feb 2017 • Tian-Zhu Xiang, Gui-Song Xia, Xiang Bai, Liangpei Zhang
On one hand, the line features are integrated into a local warping model through a designed weight function.
no code implementations • 10 Feb 2017 • Gui-Song Xia, Gang Liu, Xiang Bai, Liangpei Zhang
In contrast with existing works, the proposed method not only inherits the strong ability to depict geometrical aspects of textures and the high robustness to variations of imaging conditions from the shape-based method, but also provides a flexible way to consider shape relationships and to compute high-order statistics on the tree.
3 code implementations • CVPR 2017 • Yun Liu, Ming-Ming Cheng, Xiao-Wei Hu, Kai Wang, Xiang Bai
Using VGG16 network, we achieve \sArt results on several available datasets.
Ranked #5 on
Edge Detection
on BIPED
4 code implementations • 21 Nov 2016 • Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, Wenyu Liu
This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving no post-process except for a standard non-maximum suppression.
no code implementations • 8 Oct 2016 • Xinggang Wang, Yongluan Yan, Peng Tang, Xiang Bai, Wenyu Liu
We propose a new multiple instance neural network to learn bag representations, which is different from the existing multiple instance neural networks that focus on estimating instance label.
1 code implementation • 13 Sep 2016 • Wei Shen, Kai Zhao, Yuan Jiang, Yan Wang, Xiang Bai, Alan Yuille
By observing the relationship between the receptive field sizes of the different layers in the network and the skeleton scales they can capture, we introduce two scale-associated side outputs to each stage of the network.
1 code implementation • 18 Aug 2016 • Gui-Song Xia, Jingwen Hu, Fan Hu, Baoguang Shi, Xiang Bai, Yanfei Zhong, Liangpei Zhang
The goal of AID is to advance the state-of-the-arts in scene classification of remote sensing images.
no code implementations • 31 Jul 2016 • Peng Tang, Xinggang Wang, Baoguang Shi, Xiang Bai, Wenyu Liu, Zhuowen Tu
Our proposed FisherNet combines convolutional neural network training and Fisher Vector encoding in a single end-to-end structure.
no code implementations • 29 Jun 2016 • Cong Yao, Xiang Bai, Nong Sang, Xinyu Zhou, Shuchang Zhou, Zhimin Cao
Recently, scene text detection has become an active research topic in computer vision and document analysis, because of its great importance and significant challenge.
Ranked #6 on
Scene Text Detection
on COCO-Text
no code implementations • 1 May 2016 • Song Bai, Xiang Bai, Longin Jan Latecki, Qi Tian
How to do multidimensional scaling on multiple input distance matrices is still unsolved to our best knowledge.
1 code implementation • CVPR 2016 • Zheng Zhang, Chengquan Zhang, Wei Shen, Cong Yao, Wenyu Liu, Xiang Bai
In this paper, we propose a novel approach for text detec- tion in natural images.
Ranked #40 on
Scene Text Detection
on ICDAR 2015
no code implementations • CVPR 2016 • Song Bai, Xiang Bai, Zhichao Zhou, Zhaoxiang Zhang, Longin Jan Latecki
We name the proposed 3D shape search engine, which combines GPU acceleration and Inverted File Twice, as GIFT.
no code implementations • CVPR 2016 • Wei Shen, Kai Zhao, Yuan Jiang, Yan Wang, Zhijiang Zhang, Xiang Bai
Object skeleton is a useful cue for object detection, complementary to the object contour, as it provides a structural representation to describe the relationship among object parts.
5 code implementations • CVPR 2016 • Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, Xiang Bai
We show that the model is able to recognize several types of irregular text, including perspective text and curved text.
Ranked #10 on
Scene Text Recognition
on ICDAR 2003
no code implementations • ICCV 2015 • Xinggang Wang, Zhuotun Zhu, Cong Yao, Xiang Bai
Multiple-instance learning (MIL) has served as an important tool for a wide range of vision applications, for instance, image classification, object detection, and visual tracking.
82 code implementations • 21 Jul 2015 • Baoguang Shi, Xiang Bai, Cong Yao
In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition.
Ranked #12 on
Scene Text Recognition
on ICDAR 2003