no code implementations • 10 Jan 2025 • YanJie Li, Wenxuan Zhang, Kaisheng Liang, Bin Xiao
This work highlights the potential of dynamic NeRF-based UV mapping for creating more effective adversarial attacks on person detectors, addressing key challenges in modeling human movement and texture modification.
no code implementations • 6 Jan 2025 • Songlin Hou, Fangzhou Lin, Yunmei Huang, Zhe Peng, Bin Xiao
This paper first conducts a comprehensive study of the state-of-the-art AR and localization systems on mobile platforms.
1 code implementation • 20 Dec 2024 • Xiuli Bi, Jian Lu, Bo Liu, Xiaodong Cun, Yong Zhang, Weisheng Li, Bin Xiao
Besides, given some reference images or videos, the parameter-efficient fine-tuning method, i. e. LoRA, can generate high-quality customized concepts, e. g., the specific subject or the motions from a reference video.
no code implementations • 10 Dec 2024 • Jinwu Hu, Yufeng Wang, Shuhai Zhang, Kai Zhou, Guohao Chen, Yu Hu, Bin Xiao, Mingkui Tan
Ensemble reasoning for the strengths of different LLM experts is critical to achieving consistent and satisfactory performance on diverse inputs across a wide range of tasks.
1 code implementation • 7 Dec 2024 • Yan Zhang, Pengcheng Zheng, Chengxiao Zeng, Bin Xiao, Zhenghao Li, Xinbo Gao
In the deblurring branch, we design a pixel-adjustable kernel block (PAKB) to estimate the local and spatial-varying blur kernels.
no code implementations • 5 Dec 2024 • Chenyang Zhu, Bin Xiao, Lin Shi, Shoukun Xu, Xu Zheng
The recent Segment Anything Model (SAM) represents a significant breakthrough in scaling segmentation models, delivering strong performance across various downstream applications in the RGB modality.
1 code implementation • 5 Dec 2024 • Jiuhai Chen, Jianwei Yang, Haiping Wu, Dianqi Li, Jianfeng Gao, Tianyi Zhou, Bin Xiao
We present Florence-VL, a new family of multimodal large language models (MLLMs) with enriched visual representations produced by Florence-2, a generative vision foundation model.
no code implementations • 23 Nov 2024 • Kaisheng Liang, Xuelong Dai, YanJie Li, Dong Wang, Bin Xiao
Recent clean feature mixup methods use random clean features to perturb the feature space but lack optimization for disrupting adversarial examples, overlooking the advantages of attack-specific perturbations.
no code implementations • 7 Sep 2024 • Zhichao Yan, Hui Xue, Yi Zhu, Bin Xiao, Hao Yuan
Accurate segmentation of lesions in pancreatic endoscopic ultrasound (EUS) images is crucial for effective diagnosis and treatment.
no code implementations • 4 Sep 2024 • Bin Xiao, Lei Su
In the realm of Large Language Model (LLM) inference, the inherent structure of transformer models coupled with the multi-GPU tensor parallelism strategy leads to a sequential execution of computation and communication.
no code implementations • 28 Aug 2024 • Lujun Gui, Bin Xiao, Lei Su, WeiPeng Chen
In this paper, we reassess these approaches and propose FSPAD (Feature Sampling and Partial Alignment Distillation for Lossless Speculative Decoding), which introduces two straightforward and effective components within the existing framework to boost lossless speculative decoding.
1 code implementation • 14 Jul 2024 • Xuelong Dai, Bin Xiao
Extensive experiments demonstrate that our proposed attacks outperform state-of-the-art adversarial attack methods against both black-box models and defenses.
no code implementations • 14 Jun 2024 • Bin Xiao, Burak Kantarci, Jiawen Kang, Dusit Niyato, Mohsen Guizani
Large language models (LLMs) have demonstrated remarkable capacities on various tasks, and integrating the capacities of LLMs into the Internet of Things (IoT) applications has drawn much research attention recently.
Ranked #6 on Table-based Fact Verification on TabFact
1 code implementation • 5 Jun 2024 • Xiuli Bi, Zonglin Yang, Bo Liu, Xiaodong Cun, Chi-Man Pun, Pietro Lio, Bin Xiao
In this work, we suppose that adversarial images are outliers of the natural image manifold and the purification process can be considered as returning them to this manifold.
no code implementations • 5 May 2024 • Xiaoyu Qiao, Weisheng Li, Bin Xiao, Yuping Huang, Lijian Yang
However, the hyperparameters in the sampling process require subtle tuning, otherwise the results can be severely corrupted by hallucination artifacts, especially with out-of-distribution test data.
1 code implementation • 1 May 2024 • Bin Xiao, Chunan Shi, Xiaonan Nie, Fan Yang, Xiangwei Deng, Lei Su, WeiPeng Chen, Bin Cui
Consequently, the GPU spends most of its time on memory transfer instead of computation.
no code implementations • 22 Apr 2024 • Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Qin Cai, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Weizhu Chen, Yen-Chun Chen, Yi-Ling Chen, Hao Cheng, Parul Chopra, Xiyang Dai, Matthew Dixon, Ronen Eldan, Victor Fragoso, Jianfeng Gao, Mei Gao, Min Gao, Amit Garg, Allie Del Giorno, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Wenxiang Hu, Jamie Huynh, Dan Iter, Sam Ade Jacobs, Mojan Javaheripi, Xin Jin, Nikos Karampatziakis, Piero Kauffmann, Mahoud Khademi, Dongwoo Kim, Young Jin Kim, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Yunsheng Li, Chen Liang, Lars Liden, Xihui Lin, Zeqi Lin, Ce Liu, Liyuan Liu, Mengchen Liu, Weishung Liu, Xiaodong Liu, Chong Luo, Piyush Madan, Ali Mahmoudzadeh, David Majercak, Matt Mazzola, Caio César Teodoro Mendes, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Liliang Ren, Gustavo de Rosa, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Yelong Shen, Swadheen Shukla, Xia Song, Masahiro Tanaka, Andrea Tupini, Praneetha Vaddamanu, Chunyu Wang, Guanhua Wang, Lijuan Wang, Shuohang Wang, Xin Wang, Yu Wang, Rachel Ward, Wen Wen, Philipp Witte, Haiping Wu, Xiaoxia Wu, Michael Wyatt, Bin Xiao, Can Xu, Jiahang Xu, Weijian Xu, Jilong Xue, Sonali Yadav, Fan Yang, Jianwei Yang, Yifan Yang, ZiYi Yang, Donghan Yu, Lu Yuan, Chenruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou
We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.
Ranked #5 on MMR total on MRR-Benchmark (using extra training data)
1 code implementation • 29 Mar 2024 • Xu Ma, Xiyang Dai, Jianwei Yang, Bin Xiao, Yinpeng Chen, Yun Fu, Lu Yuan
We demonstrate that the modulation mechanism is particularly well suited for efficient networks and further tailor the modulation design by proposing the efficient modulation (EfficientMod) block, which is considered the essential building block for our networks.
1 code implementation • 15 Mar 2024 • Hongyuan Yu, Cheng Wan, Mengchen Liu, Dongdong Chen, Bin Xiao, Xiyang Dai
Manually replacing convolution layers with multi-head self-attention is non-trivial due to the costly overhead in memory to maintain high resolution.
1 code implementation • 1 Dec 2023 • Bin Xiao, Murat Simsek, Burak Kantarci, Ala Abu Alkheir
However, existing detection-based models usually cannot perform as well as other types of solutions regarding cell-level TSR metrics, such as TEDS, and the underlying reasons limiting the performance of these models on the TSR task are also not well-explored.
1 code implementation • CVPR 2024 • Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan
We introduce Florence-2, a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.
Ranked #1 on Visual Grounding on RefCOCO+ val (using extra training data)
no code implementations • 2 Nov 2023 • Xiuli Bi, Bo Liu, Fan Yang, Bin Xiao, Weisheng Li, Gao Huang, Pamela C. Cosman
This paper approaches the generated image detection problem from a new perspective: Start from real images.
no code implementations • 20 Oct 2023 • Shuo Wang, Keke Gai, Jing Yu, Liehuang Zhu, Kim-Kwang Raymond Choo, Bin Xiao
Then the passive party, who owns only features of the sample, injects the blinding factor into the local embedding and sends it to the active party.
no code implementations • 1 Oct 2023 • YanJie Li, Bin Xie, Songtao Guo, Yuanyuan Yang, Bin Xiao
Lots of papers have emerged to investigate the robustness and safety of deep learning models against adversarial attacks.
1 code implementation • ICCV 2023 • Kan Wu, Houwen Peng, Zhenghong Zhou, Bin Xiao, Mengchen Liu, Lu Yuan, Hong Xuan, Michael Valenzuela, Xi, Chen, Xinggang Wang, Hongyang Chao, Han Hu
In this paper, we propose a novel cross-modal distillation method, called TinyCLIP, for large-scale language-image pre-trained models.
2 code implementations • 19 Sep 2023 • Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong, Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, Juntao Dai, Kun Fang, Lei Su, Liang Song, Lifeng Liu, Liyun Ru, Luyao Ma, Mang Wang, Mickel Liu, MingAn Lin, Nuolan Nie, Peidong Guo, Ruiyang Sun, Tao Zhang, Tianpeng Li, Tianyu Li, Wei Cheng, WeiPeng Chen, Xiangrong Zeng, Xiaochuan Wang, Xiaoxi Chen, Xin Men, Xin Yu, Xuehai Pan, Yanjun Shen, Yiding Wang, Yiyu Li, Youxin Jiang, Yuchen Gao, Yupeng Zhang, Zenan Zhou, Zhiying Wu
Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering.
no code implementations • 10 Aug 2023 • YanJie Li, Mingxing Duan, Xuelong Dai, Bin Xiao
In the first stage, we extract multi-scale style embeddings by a pyramid-like network and identity embeddings by a pretrained FR model and propose a novel Attention-guided Adaptive Instance Normalization layer (AAIN) to merge them via background-patch cross-attention maps.
1 code implementation • 24 Jul 2023 • Xuelong Dai, Kaisheng Liang, Bin Xiao
Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques.
no code implementations • 4 Jun 2023 • Xinhang Wan, Bin Xiao, Xinwang Liu, Jiyuan Liu, Weixuan Liang, En Zhu
Such an incomplete continual data problem (ICDP) in MVC is tough to solve since incomplete information with continual data increases the difficulty of extracting consistent and complementary knowledge among views.
1 code implementation • 30 May 2023 • Bin Xiao, Murat Simsek, Burak Kantarci, Ala Abu Alkheir
Table Detection (TD) is a fundamental task to enable visually rich document understanding, which requires the model to extract information without information loss.
no code implementations • 21 May 2023 • ZiYi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Mei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang
The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities.
no code implementations • 4 May 2023 • Bin Xiao, Murat Simsek, Burak Kantarci, Ala Abu Alkheir
Moreover, to enrich the data sources, we propose a new ICT-TD dataset using the PDF files of Information and Communication Technologies (ICT) commodities, a different domain containing unique samples that hardly appear in open datasets.
1 code implementation • CVPR 2023 • Kaisheng Liang, Bin Xiao
Our method can prevent adversarial examples from using non-robust style features and help generate transferable perturbations.
no code implementations • 20 Apr 2023 • Lingyuan Meng, Ke Liang, Bin Xiao, Sihang Zhou, Yue Liu, Meng Liu, Xihong Yang, Xinwang Liu
Moreover, most of the existing methods ignore leveraging the beneficial information from aliasing relations (AR), i. e., data-rich relations with similar contextual semantics to the target data-poor relation.
1 code implementation • CVPR 2023 • Ping Chen, Xingpeng Zhang, Ye Li, Ju Tao, Bin Xiao, Bing Wang, Zongjie Jiang
Inspired by the transfer learning, we designed the Delta Age AdaIN (DAA) operation to obtain the feature difference with each age, which obtains the style map of each age through the learned values representing the mean and standard deviation.
no code implementations • 15 Feb 2023 • Wenxuan Tu, Bin Xiao, Xinwang Liu, Sihang Zhou, Zhiping Cai, Jieren Cheng
With the development of various applications, such as social networks and knowledge graphs, graph data has been ubiquitous in the real world.
1 code implementation • CVPR 2023 • Bin Xiao, Yang Hu, Bo Liu, Xiuli Bi, Weisheng Li, Xinbo Gao
Since their binarization processes are not a component of the network, the learning-based binary descriptor cannot fully utilize the advances of deep learning.
1 code implementation • CVPR 2023 • Yongchao Wang, Bin Xiao, Xiuli Bi, Weisheng Li, Xinbo Gao
Inspired by the plain contrast idea, MCF introduces two different subnets to explore and utilize the discrepancies between subnets to correct cognitive bias of the model.
no code implementations • 28 Nov 2022 • Jingcan Duan, Bin Xiao, Siwei Wang, Haifang Zhou, Xinwang Liu
The average node-pair similarity can be regarded as the topology anomaly degree of nodes within substructures.
no code implementations • 3 Nov 2022 • Bin Xiao, Yakup Akkaya, Murat Simsek, Burak Kantarci, Ala Abu Alkheir
Table Structure Recognition (TSR) aims to represent tables with complex structures in a machine-interpretable format so that the tabular data can be processed automatically.
1 code implementation • 12 Oct 2022 • Bin Xiao, Chien-Liang Liu, Wen-Hoar Hsaio
Our proposed model uses word-embedding representations as semantic features to help train the embedding network and a semantic cross-attention module to bridge the semantic features into the typical visual modal.
no code implementations • 11 Aug 2022 • Bin Xiao, Murat Simsek, Burak Kantarci, Ala Abu Alkheir
To transform the tabular data in electronic documents into a machine-interpretable format and provide layout and semantic information for information extraction and interpretation, we define a Table Structure Recognition (TSR) task and a Table Cell Type Classification (CTC) task.
1 code implementation • 26 Jul 2022 • Haoxuan You, Luowei Zhou, Bin Xiao, Noel Codella, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan
Large-scale multi-modal contrastive pre-training has demonstrated great utility to learn transferable features for a range of downstream tasks by mapping multiple modalities into a shared embedding space.
2 code implementations • 21 Jul 2022 • Kan Wu, Jinnian Zhang, Houwen Peng, Mengchen Liu, Bin Xiao, Jianlong Fu, Lu Yuan
It achieves a top-1 accuracy of 84. 8% on ImageNet-1k with only 21M parameters, being comparable to Swin-B pretrained on ImageNet-21k while using 4. 2 times fewer parameters.
Ranked #145 on Image Classification on ImageNet
no code implementations • 19 Jul 2022 • Zhenrong Shen, Xi Ouyang, Bin Xiao, Jie-Zhi Cheng, Qian Wang, Dinggang Shen
Moreover, we propose to synthesize nodule CXR images by controlling the disentangled nodule attributes for data augmentation, in order to better compensate for the nodules that are easily missed in the detection task.
no code implementations • CVPR 2023 • YanJie Li, Yiquan Li, Xuelong Dai, Songtao Guo, Bin Xiao
2D face recognition has been proven insecure for physical adversarial attacks.
no code implementations • 3 May 2022 • ZiYi Yang, Yuwei Fang, Chenguang Zhu, Reid Pryzant, Dongdong Chen, Yu Shi, Yichong Xu, Yao Qian, Mei Gao, Yi-Ling Chen, Liyang Lu, Yujia Xie, Robert Gmyr, Noel Codella, Naoyuki Kanda, Bin Xiao, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang
Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview.
no code implementations • 22 Apr 2022 • Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Xiyang Dai, Bin Xiao, Jianwei Yang, Haoxuan You, Kai-Wei Chang, Shih-Fu Chang, Lu Yuan
Experiments demonstrate that MAD leads to consistent gains in the low-shot, domain-shifted, and fully-supervised conditions on VCR, SNLI-VE, and VQA, achieving SOTA performance on VCR compared to other single models pretrained with image-text data.
Ranked #4 on Visual Question Answering (VQA) on VCR (Q-A) test
2 code implementations • CVPR 2022 • Jinnian Zhang, Houwen Peng, Kan Wu, Mengchen Liu, Bin Xiao, Jianlong Fu, Lu Yuan
The central idea of MiniViT is to multiplex the weights of consecutive transformer blocks.
Ranked #227 on Image Classification on ImageNet
1 code implementation • CVPR 2022 • Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Bin Xiao, Ce Liu, Lu Yuan, Jianfeng Gao
Particularly, it attains gains up to 9. 2% and 14. 5% in average on zero-shot recognition benchmarks over the language-image contrastive learning and supervised learning methods, respectively.
3 code implementations • 7 Apr 2022 • Mingyu Ding, Bin Xiao, Noel Codella, Ping Luo, Jingdong Wang, Lu Yuan
We show that these two self-attentions complement each other: (i) since each channel token contains an abstract representation of the entire image, the channel attention naturally captures global interactions and representations by taking all spatial positions into account when computing attention scores between channels; (ii) the spatial attention refines the local representations by performing fine-grained interactions across spatial locations, which in turn helps the global information modeling in channel attention.
Ranked #1 on Medical Image Classification on ImageNet
no code implementations • 8 Mar 2022 • Bin Xiao, Murat Simsek, Burak Kantarci, Ala Abu Alkheir
Table Structure Recognition (TSR) problem aims to recognize the structure of a table and transform the unstructured tables into a structured and machine-readable format so that the tabular data can be further analysed by the down-stream tasks, such as semantic modeling and information retrieval.
no code implementations • 15 Jan 2022 • Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Jianwei Yang, Xiyang Dai, Bin Xiao, Haoxuan You, Shih-Fu Chang, Lu Yuan
Experiments demonstrate that our proposed CLIP-TD leads to exceptional gains in the low-shot (up to 51. 9%) and domain-shifted (up to 71. 3%) conditions of VCR, while simultaneously improving performance under standard fully-supervised conditions (up to 2%), achieving state-of-art performance on VCR compared to other single models that are pretrained with image-text data only.
1 code implementation • NeurIPS 2021 • Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Xiyang Dai, Bin Xiao, Lu Yuan, Jianfeng Gao
With focal attention, we propose a new variant of Vision Transformer models, called Focal Transformers, which achieve superior performance over the state-of-the-art (SoTA) Vision Transformers on a range of public image classification and object detection benchmarks.
2 code implementations • 22 Nov 2021 • Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, JianFeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang
Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications.
Ranked #1 on Action Recognition In Videos on Kinetics-600
1 code implementation • 17 Nov 2021 • Xuelong Dai, YanJie Li, Hua Dai, Bin Xiao
The unrestricted adversarial attack loss is incorporated in the special adversarial training of GAN, which enables the generator to generate the adversarial examples to spoof the target network.
no code implementations • 29 Sep 2021 • Haoxuan You, Luowei Zhou, Bin Xiao, Noel C Codella, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan
Large-scale multimodal contrastive pretraining has demonstrated great utility to support high performance in a range of downstream tasks by mapping multiple modalities into a shared embedding space.
3 code implementations • 1 Jul 2021 • Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Xiyang Dai, Bin Xiao, Lu Yuan, Jianfeng Gao
With focal self-attention, we propose a new variant of Vision Transformer models, called Focal Transformer, which achieves superior performance over the state-of-the-art vision Transformers on a range of public image classification and object detection benchmarks.
Ranked #19 on Instance Segmentation on COCO test-dev
1 code implementation • ICML Workshop AML 2021 • Fan Liu, Shuyu Zhao, Xuelong Dai, Bin Xiao
Although adversarial training (AT) methods such as Adversarial Query (AQ) can improve the adversarially robust performance of meta-learning models, AT is still computationally expensive training.
1 code implementation • ICLR 2022 • Chunyuan Li, Jianwei Yang, Pengchuan Zhang, Mei Gao, Bin Xiao, Xiyang Dai, Lu Yuan, Jianfeng Gao
This paper investigates two techniques for developing efficient self-supervised vision transformers (EsViT) for visual representation learning.
Ranked #17 on Self-Supervised Image Classification on ImageNet
Representation Learning Self-Supervised Image Classification
3 code implementations • CVPR 2021 • Xiyang Dai, Yinpeng Chen, Bin Xiao, Dongdong Chen, Mengchen Liu, Lu Yuan, Lei Zhang
In this paper, we present a novel dynamic head framework to unify object detection heads with attentions.
Ranked #7 on Object Detection on COCO 2017 val (AP75 metric)
15 code implementations • CVPR 2021 • Changqian Yu, Bin Xiao, Changxin Gao, Lu Yuan, Lei Zhang, Nong Sang, Jingdong Wang
We introduce a lightweight unit, conditional channel weighting, to replace costly pointwise (1x1) convolutions in shuffle blocks.
Ranked #33 on Pose Estimation on COCO test-dev (using extra training data)
2 code implementations • CVPR 2021 • Zigang Geng, Ke Sun, Bin Xiao, Zhaoxiang Zhang, Jingdong Wang
Our motivation is that regressing keypoint positions accurately needs to learn representations that focus on the keypoint regions.
3 code implementations • ICCV 2021 • Pengchuan Zhang, Xiyang Dai, Jianwei Yang, Bin Xiao, Lu Yuan, Lei Zhang, Jianfeng Gao
This paper presents a new Vision Transformer (ViT) architecture Multi-Scale Vision Longformer, which significantly enhances the ViT of \cite{dosovitskiy2020image} for encoding high-resolution images using two techniques.
Ranked #47 on Instance Segmentation on COCO minival
16 code implementations • ICCV 2021 • Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang
We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.
Ranked #2 on Image Classification on Flowers-102 (using extra training data)
no code implementations • ICCV 2021 • Xiuli Bi, Zhipeng Zhang, Bin Xiao
For detecting the tampered regions, a forgery localization generator GM is proposed based on a multi-decoder-single-task strategy.
no code implementations • ICCV 2021 • Bin Xiao, Haifeng Wu, Xiuli Bi
The proposed DTMNet is an end-to-end deep neural network with only one convolutional layer and three fully connected layers.
no code implementations • 1 Jan 2021 • Depu Meng, Zigang Geng, Zhirong Wu, Bin Xiao, Houqiang Li, Jingdong Wang
The proposed consistent instance classification (ConIC) approach simultaneously optimizes the classification loss and an additional consistency loss explicitly penalizing the feature dissimilarity between the augmented views from the same instance.
no code implementations • 11 Dec 2020 • Bin Xiao, Tao Geng, Xiuli Bi, Weisheng Li
In this paper, a color-related local binary pattern (cLBP) which learns the dominant patterns from the decoded LBP is proposed for color images recognition.
no code implementations • 3 Dec 2020 • Bo Liu, Ranglei Wu, Xiuli Bi, Bin Xiao, Weisheng Li, Guoyin Wang, Xinbo Gao
The unfixed encoder autonomously learns the image fingerprints that differentiate between the tampered and non-tampered regions, whereas the fixed encoder intentionally provides the direction information that assists the learning and detection of the network.
1 code implementation • 9 Sep 2020 • Bin Xiao, Chien-Liang Liu, Wen-Hoar Hsaio
We conclude that the success of metric-learning based approaches lies in the data embedding, the representative of each class, and the distance metric.
1 code implementation • 28 Jun 2020 • Ke Sun, Zigang Geng, Depu Meng, Bin Xiao, Dong Liu, Zhao-Xiang Zhang, Jingdong Wang
The typical bottom-up human pose estimation framework includes two stages, keypoint detection and grouping.
no code implementations • AAAI 2020 • Haiping Wu, Bin Xiao
n this work, we tackle the problem of estimating 3D human pose in camera space from a monocular image.
Ranked #22 on 3D Human Pose Estimation on MPI-INF-3DHP (PCK metric)
1 code implementation • cvpr 2019 workshop 2019 • Xiuli Bi, Yang Wei, Bin Xiao, Weisheng Li
The core idea of the RRU-Net is to strengthen the learning way of CNN, which is inspired by the recall and the consolidation mechanism of the human brain and implemented by the propagation and the feedback process of the residual in CNN.
19 code implementations • CVPR 2020 • Bowen Cheng, Bin Xiao, Jingdong Wang, Honghui Shi, Thomas S. Huang, Lei Zhang
HigherHRNet even surpasses all top-down methods on CrowdPose test (67. 6% AP), suggesting its robustness in crowded scene.
Ranked #2 on Pose Estimation on UAV-Human
42 code implementations • 20 Aug 2019 • Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, Bin Xiao
High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.
Ranked #1 on Object Detection on COCO test-dev (Hardware Burden metric, using extra training data)
39 code implementations • 9 Apr 2019 • Ke Sun, Yang Zhao, Borui Jiang, Tianheng Cheng, Bin Xiao, Dong Liu, Yadong Mu, Xinggang Wang, Wenyu Liu, Jingdong Wang
The proposed approach achieves superior results to existing single-model networks on COCO object detection.
Ranked #7 on Semantic Segmentation on LIP val
39 code implementations • CVPR 2019 • Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang
We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel.
Ranked #1 on 3D Pose Estimation on HARPER
27 code implementations • ECCV 2018 • Bin Xiao, Haiping Wu, Yichen Wei
There has been significant progress on pose estimation and increasing interests on pose tracking in recent years.
Ranked #2 on 2D Human Pose Estimation on JHMDB (2D poses only)
2 code implementations • ECCV 2018 • Xiao Sun, Bin Xiao, Fangyin Wei, Shuang Liang, Yichen Wei
State-of-the-art human pose estimation methods are based on heat map representation.
Ranked #23 on Pose Estimation on MPII Human Pose
no code implementations • ICCV 2017 • Ting Zhang, Guo-Jun Qi, Bin Xiao, Jingdong Wang
The main point lies in a novel building block, a pair of two successive interleaved group convolutions: primary group convolution and secondary group convolution.
2 code implementations • 10 Jul 2017 • Ting Zhang, Guo-Jun Qi, Bin Xiao, Jingdong Wang
The main point lies in a novel building block, a pair of two successive interleaved group convolutions: primary group convolution and secondary group convolution.