no code implementations • 2 Jun 2023 • Srikar Appalaraju, Peng Tang, Qi Dong, Nishant Sankaran, Yichu Zhou, R. Manmatha
We propose DocFormerv2, a multi-modal transformer for Visual Document Understanding (VDU).
Optical Character Recognition (OCR)
Visual Question Answering (VQA)
no code implementations • 17 Apr 2023 • Yunruo Zhang, Tianyu Du, Shouling Ji, Peng Tang, Shanqing Guo
In this paper, we propose the first certified defense against multi-frame attacks for RNNs called RNN-Guard.
no code implementations • 5 Sep 2022 • Yang Nan, Javier Del Ser, Zeyu Tang, Peng Tang, Xiaodan Xing, Yingying Fang, Francisco Herrera, Witold Pedrycz, Simon Walsh, Guang Yang
especially for cohorts with different lung diseases.
no code implementations • 4 Aug 2022 • Yang Nan, Peng Tang, Guyue Zhang, Caihong Zeng, Zhihong Liu, Zhifan Gao, Heye Zhang, Guang Yang
However, most machine and deep learning based approaches are supervised and developed using a large number of training samples, in which the pixelwise annotations are expensive and sometimes can be impossible to obtain.
no code implementations • 11 Mar 2022 • Yang Nan, Fengyi Li, Peng Tang, Guyue Zhang, Caihong Zeng, Guotong Xie, Zhihong Liu, Guang Yang
Recognition of glomeruli lesions is the key for diagnosis and treatment planning in kidney pathology; however, the coexisting glomerular structures such as mesangial regions exacerbate the difficulties of this task.
no code implementations • 31 May 2021 • Yan Wang, Peng Tang, Yuyin Zhou, Wei Shen, Elliot K. Fishman, Alan L. Yuille
We instantiate both the global and the local classifiers by multiple instance learning (MIL), where the attention guidance, indicating roughly where the PDAC regions are, is the key to bridging them: For global MIL based normal/PDAC classification, attention serves as a weight for each instance (voxel) during MIL pooling, which eliminates the distraction from the background; For local MIL based semi-supervised PDAC segmentation, the attention guidance is inductive, which not only provides bag-level pseudo-labels to training data without per-voxel annotations for MIL training, but also acts as a proxy of an instance-level classifier.
1 code implementation • ICLR 2021 • Yingwei Li, Qihang Yu, Mingxing Tan, Jieru Mei, Peng Tang, Wei Shen, Alan Yuille, Cihang Xie
To prevent models from exclusively attending on a single cue in representation learning, we augment training data with images with conflicting shape and texture information (eg, an image of chimpanzee shape but with lemon texture) and, most importantly, provide the corresponding supervisions from shape and texture simultaneously.
Ranked #535 on
Image Classification
on ImageNet
no code implementations • 25 Jan 2020 • Zhenfang Chen, Lin Ma, Wenhan Luo, Peng Tang, Kwan-Yee K. Wong
In this paper, we study the problem of weakly-supervised temporal grounding of sentence in video.
no code implementations • 15 Jan 2020 • Peng Tang, Chetan Ramaiah, Yan Wang, ran Xu, Caiming Xiong
two-stage object detectors) by training on both labeled and unlabeled data.
1 code implementation • 11 May 2019 • Hongru Zhu, Peng Tang, Jeongho Park, Soojin Park, Alan Yuille
We test both humans and the above-mentioned computational models in a challenging task of object recognition under extreme occlusion, where target objects are heavily occluded by irrelevant real objects in real backgrounds.
no code implementations • ECCV 2018 • Peng Tang, Xinggang Wang, Angtian Wang, Yongluan Yan, Wenyu Liu, Junzhou Huang, Alan Yuille
The Convolutional Neural Network (CNN) based region proposal generation method (i. e. region proposal network), trained using bounding box annotations, is an essential component in modern fully supervised object detectors.
4 code implementations • 9 Jul 2018 • Peng Tang, Xinggang Wang, Song Bai, Wei Shen, Xiang Bai, Wenyu Liu, Alan Yuille
The iterative instance classifier refinement is implemented online using multiple streams in convolutional neural networks, where the first is an MIL network and the others are for instance classifier refinement supervised by the preceding one.
Ranked #1 on
Weakly Supervised Object Detection
on ImageNet
no code implementations • 7 Apr 2018 • Yuyin Zhou, Yan Wang, Peng Tang, Song Bai, Wei Shen, Elliot K. Fishman, Alan L. Yuille
In multi-organ segmentation of abdominal CT scans, most existing fully supervised deep learning algorithms require lots of voxel-wise annotations, which are usually difficult, expensive, and slow to obtain.
no code implementations • 7 Apr 2018 • Yan Wang, Yuyin Zhou, Peng Tang, Wei Shen, Elliot K. Fishman, Alan L. Yuille
Based on the fact that very hard samples might have annotation errors, we propose a new sample selection policy, named Relaxed Upper Confident Bound (RUCB).
no code implementations • 30 Jan 2018 • Peng Tang, Chunyu Wang, Xinggang Wang, Wenyu Liu, Wen-Jun Zeng, Jingdong Wang
In particular, our method improves results by 8. 8% over the static image detector for fast moving objects.
no code implementations • 19 Sep 2017 • Gangming Zhao, Zhao-Xiang Zhang, He Guan, Peng Tang, Jingdong Wang
Most of convolutional neural networks share the same characteristic: each convolutional layer is followed by a nonlinear activation layer where Rectified Linear Unit (ReLU) is the most widely used.
1 code implementation • 6 May 2017 • Peng Tang, Xinggang Wang, Zilong Huang, Xiang Bai, Wenyu Liu
Patch-level image representation is very important for object classification and detection, since it is robust to spatial transformation, scale variation, and cluttered background.
4 code implementations • CVPR 2017 • Peng Tang, Xinggang Wang, Xiang Bai, Wenyu Liu
We propose a novel online instance classifier refinement algorithm to integrate MIL and the instance classifier refinement procedure into a single deep network, and train the network end-to-end with only image-level supervision, i. e., without object location information.
Ranked #4 on
Weakly Supervised Object Detection
on ImageNet
no code implementations • 8 Oct 2016 • Xinggang Wang, Yongluan Yan, Peng Tang, Xiang Bai, Wenyu Liu
We propose a new multiple instance neural network to learn bag representations, which is different from the existing multiple instance neural networks that focus on estimating instance label.
no code implementations • 31 Jul 2016 • Peng Tang, Xinggang Wang, Baoguang Shi, Xiang Bai, Wenyu Liu, Zhuowen Tu
Our proposed FisherNet combines convolutional neural network training and Fisher Vector encoding in a single end-to-end structure.