1 code implementation • 30 Jan 2023 • Ming Tao, Bing-Kun Bao, Hao Tang, Changsheng Xu
The complex scene understanding ability of CLIP enables the discriminator to accurately assess the image quality.
no code implementations • 24 Jan 2023 • Baptiste Chopin, Hao Tang, Mohamed Daoudi
The generation of natural human motion interactions is a hot topic in computer vision and computer animation.
no code implementations • 9 Jan 2023 • Hao Tang, Kevin Liang, Kristen Grauman, Matt Feiszli, Weiyao Wang
We thus introduce EgoTracks, a new dataset for long-term egocentric visual object tracking.
1 code implementation • 12 Dec 2022 • Hui Wei, Zhixiang Wang, Xuemei Jia, Yinqiang Zheng, Hao Tang, Shin'ichi Satoh, Zheng Wang
Adversarial attacks on thermal infrared imaging expose the risk of related applications.
no code implementations • 7 Dec 2022 • Hao Ding, Changchang Sun, Hao Tang, Dawen Cai, Yan Yan
Recently, due to the increasing requirements of medical imaging applications and the professional requirements of annotating medical images, few-shot learning has gained increasing attention in the medical image semantic segmentation field.
1 code implementation • 19 Nov 2022 • Zhenglun Kong, Haoyu Ma, Geng Yuan, Mengshu Sun, Yanyue Xie, Peiyan Dong, Xin Meng, Xuan Shen, Hao Tang, Minghai Qin, Tianlong Chen, Xiaolong Ma, Xiaohui Xie, Zhangyang Wang, Yanzhi Wang
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization.
no code implementations • 17 Nov 2022 • Tzu-Quan Lin, Hung-Yi Lee, Hao Tang
Self-supervised models have had great success in learning speech representations that can generalize to various downstream tasks.
no code implementations • 17 Nov 2022 • Tzu-Quan Lin, Tsung-Huan Yang, Chun-Yao Chang, Kuang-Ming Chen, Tzu-hsun Feng, Hung-Yi Lee, Hao Tang
Despite the success of Transformers in self-supervised learning with applications to various downstream tasks, the computational cost of training and inference remains a major challenge for applying these models to a wide spectrum of devices.
1 code implementation • 12 Nov 2022 • Hao Tang, Ling Shao, Philip H. S. Torr, Nicu Sebe
To further capture the change in pose of each part more precisely, we propose a novel part-aware bipartite graph reasoning (PBGR) block to decompose the task of reasoning the global structure transformation with a bipartite graph into learning different local transformations for different semantic body/face parts.
no code implementations • 12 Nov 2022 • Hao Tang, Lei Ding, Songsong Wu, Bin Ren, Nicu Sebe, Paolo Rota
The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically.
no code implementations • 2 Nov 2022 • Xuan Shen, Zhenglun Kong, Minghai Qin, Peiyan Dong, Geng Yuan, Xin Meng, Hao Tang, Xiaolong Ma, Yanzhi Wang
That is, there exists a subset of input image patches such that a ViT can be trained from scratch by using only this subset of patches and achieve similar accuracy to the ViTs trained by using all image patches.
no code implementations • 29 Oct 2022 • Sung-Lin Yeh, Hao Tang
While discrete latent variable models have had great success in self-supervised learning, most models assume that frames are independent.
no code implementations • 28 Oct 2022 • Ramon Sanabria, Hao Tang, Sharon Goldwater
Given the strong results of self-supervised models on various tasks, there have been surprisingly few studies exploring self-supervised representations for acoustic word embeddings (AWE), fixed-dimensional vectors representing variable-length spoken word segments.
1 code implementation • 27 Oct 2022 • Chin-Yun Yu, Sung-Lin Yeh, György Fazekas, Hao Tang
Moreover, by coupling the proposed sampling method with an unconditional DM, i. e., a DM with no auxiliary inputs to its noise predictor, we can generalize it to a wide range of SR setups.
no code implementations • 13 Oct 2022 • Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe, Paola Garcia, Hung-Yi Lee, Hao Tang
Subsampling while training self-supervised models not only improves the overall performance on downstream tasks under certain frame rates, but also brings significant speed-up in inference.
1 code implementation • 10 Oct 2022 • Yitong Xia, Hao Tang, Radu Timofte, Luc van Gool
NeRFmm is the Neural Radiance Fields (NeRF) that deal with Joint Optimization tasks, i. e., reconstructing real-world scenes and registering camera parameters simultaneously.
1 code implementation • 4 Oct 2022 • Zican Zha, Hao Tang, Yunlian Sun, Jinhui Tang
To address this challenging task, we propose a two-stage background suppression and foreground alignment framework, which is composed of a background activation suppression (BAS) module, a foreground object alignment (FOA) module, and a local-to-local (L2L) similarity metric.
1 code implementation • 30 Sep 2022 • Hui Wei, Hao Tang, Xuemei Jia, Hanxun Yu, Zhubo Li, Zhixiang Wang, Shin'ichi Satoh, Zheng Wang
Although Deep Neural Networks (DNNs) have achieved impressive results in computer vision, their exposed vulnerability to adversarial attacks remains a serious concern.
1 code implementation • 16 Sep 2022 • Haoyu Ma, Zhe Wang, Yifei Chen, Deying Kong, Liangjian Chen, Xingwei Liu, Xiangyi Yan, Hao Tang, Xiaohui Xie
In this paper, we propose the token-Pruned Pose Transformer (PPT) for 2D human pose estimation, which can locate a rough human mask and performs self-attention only within selected tokens.
Ranked #1 on
3D Human Pose Estimation
on Human3.6M
(Using 2D ground-truth joints metric, using extra
training data)
1 code implementation • 5 Sep 2022 • Hao Tang, Nicu Sebe
We propose a simple yet powerful Landmark guided Generative Adversarial Network (LandmarkGAN) for the facial expression-to-expression translation using a single image, which is an important and challenging task in computer vision since the expression-to-expression translation is a non-linear and non-aligned problem.
no code implementations • 30 Aug 2022 • Shuanglin Yan, Hao Tang, Liyan Zhang, Jinhui Tang
Secondly, we propose an implicit local alignment module to adaptively aggregate image and text features to a set of modality-shared semantic topic centers, and implicitly learn the local fine-grained correspondence between images and texts without additional supervision information and complex cross-modal interactions.
1 code implementation • 26 Aug 2022 • Jichao Zhang, Aliaksandr Siarohin, Yahui Liu, Hao Tang, Nicu Sebe, Wei Wang
To this end, we introduce a conditional GNeRF model that uses specific attribute labels as input in order to improve the controllabilities and disentangling abilities of 3D-aware generative models.
1 code implementation • 25 Aug 2022 • Jianbing Wu, Hong Liu, Wei Shi, Hao Tang, Jingwen Guo
To mitigate the resolution degradation issue and mine identity-sensitive cues from human faces, we propose to restore the missing facial details using prior facial knowledge, which is then propagated to a smaller network.
no code implementations • 19 Aug 2022 • Pan Xie, Qipeng Zhang, Zexian Li, Hao Tang, Yao Du, Xiaohui Hu
To address this issue, we propose a vector quantized diffusion method for conditional pose sequences generation, called PoseVQ-Diffusion, which is an iterative non-autoregressive method.
no code implementations • 10 Aug 2022 • Zhengang Li, Mengshu Sun, Alec Lu, Haoyu Ma, Geng Yuan, Yanyue Xie, Hao Tang, Yanyu Li, Miriam Leeser, Zhangyang Wang, Xue Lin, Zhenman Fang
Compared with state-of-the-art ViT quantization work (algorithmic approach only without hardware acceleration), our quantization achieves 0. 47% to 1. 36% higher Top-1 accuracy under the same bit-width.
1 code implementation • 25 Jul 2022 • Yushu Wu, Yifan Gong, Pu Zhao, Yanyu Li, Zheng Zhan, Wei Niu, Hao Tang, Minghai Qin, Bin Ren, Yanzhi Wang
Instead of measuring the speed on mobile devices at each iteration during the search process, a speed model incorporated with compiler optimizations is leveraged to predict the inference latency of the SR block with various width configurations for faster convergence.
1 code implementation • 21 Jul 2022 • Guolei Sun, Yun Liu, Hao Tang, Ajad Chhatkuli, Le Zhang, Luc van Gool
The essence of video semantic segmentation (VSS) is how to leverage temporal information for prediction.
1 code implementation • 21 Jul 2022 • JieZhang Cao, Jingyun Liang, Kai Zhang, Wenguan Wang, Qin Wang, Yulun Zhang, Hao Tang, Luc van Gool
These issues can be alleviated by a cascade of three separate sub-tasks, including video deblurring, frame interpolation, and super-resolution, which, however, would fail to capture the spatial and temporal correlations among video sequences.
no code implementations • 17 Jul 2022 • Bin Xie, Hao Tang, Bin Duan, Dawen Cai, Yan Yan
Brain vessel image segmentation can be used as a promising biomarker for better prevention and treatment of different diseases.
no code implementations • 16 Jul 2022 • Daqian Shi, Xiaolei Diao, Hao Tang, Xiaomin Li, Hao Xing, Hao Xu
SENet aims to preserve the structural consistency of the character and normalize complex noise.
1 code implementation • 16 Jul 2022 • Daqian Shi, Xiaolei Diao, Lida Shi, Hao Tang, Yang Chi, Chuntao Li, Hao Xu
Degraded images commonly exist in the general sources of character images, leading to unsatisfactory character recognition results.
no code implementations • 12 Jul 2022 • Xiaolei Diao, Daqian Shi, Hao Tang, Yanzeng Li, Lei Wu, Hao Xu
In this paper, we propose a zero-shot character recognition framework via radical extraction (REZCR) to improve the recognition performance of few-sample character categories in the tail.
no code implementations • 9 Jul 2022 • Bin Ren, Hao Tang, Yiming Wang, Xia Li, Wei Wang, Nicu Sebe
For semantic-guided cross-view image translation, it is crucial to learn where to sample pixels from the source view image and where to reallocate them guided by the target view semantic map, especially when there is little overlap or drastic view difference between the source and target images.
1 code implementation • 7 Jul 2022 • Zhan Chen, Hong Liu, Tianyu Guo, Zhengyan Chen, Pinhao Song, Hao Tang
First, SkeleMix utilizes the topological information of skeleton data to mix two skeleton sequences by randomly combing the cropped skeleton fragments (the trimmed view) with the remaining skeleton sequences (the truncated view).
1 code implementation • 4 Jul 2022 • Baptiste Chopin, Hao Tang, Naima Otberdout, Mohamed Daoudi, Nicu Sebe
To address this limitation, we propose a novel interaction Transformer (InterFormer) consisting of a Transformer network with both temporal and spatial attention.
1 code implementation • 1 Jul 2022 • Jichao Zhang, Jingjing Chen, Hao Tang, Enver Sangineto, Peng Wu, Yan Yan, Nicu Sebe, Wei Wang
Solving this problem using an unsupervised method remains an open problem, especially for high-resolution face images in the wild, which are not easy to annotate with gaze and head pose labels.
1 code implementation • 29 Jun 2022 • Sherwin Bahmani, Jeong Joon Park, Despoina Paschalidou, Hao Tang, Gordon Wetzstein, Leonidas Guibas, Luc van Gool, Radu Timofte
Generative models have emerged as an essential building block for many image synthesis and editing tasks.
no code implementations • 13 Jun 2022 • Wenhao Li, Hong Liu, Tianyu Guo, Hao Tang, Runwei Ding
To the best of our knowledge, this is the first MLP-Like architecture for 3D human pose estimation in a single frame and a video sequence.
Ranked #44 on
3D Human Pose Estimation
on Human3.6M
no code implementations • 13 Jun 2022 • Hao Tang, Kevin Ellis
Toward combining inductive reasoning with perception abilities, we develop techniques for neurosymbolic program synthesis where perceptual input is first parsed by neural nets into a low-dimensional interpretable representation, which is then processed by a synthesized program.
no code implementations • 7 Jun 2022 • Shanlin Sun, Kun Han, Hao Tang, Deying Kong, Junayed Naushad, Xiangyi Yan, Xiaohui Xie
Traditional methods for image registration are primarily optimization-driven, finding the optimal deformations that maximize the similarity between two images.
no code implementations • 2 Jun 2022 • Yanyu Li, Xuan Shen, Geng Yuan, Jiexiong Guan, Wei Niu, Hao Tang, Bin Ren, Yanzhi Wang
In this work we demonstrate real-time portrait stylization, specifically, translating self-portrait into cartoon or anime style on mobile devices.
1 code implementation • 2 Jun 2022 • Ming Tao, Bing-Kun Bao, Hao Tang, Fei Wu, Longhui Wei, Qi Tian
To solve these limitations, we propose: (i) a Dynamic Editing Block (DEBlock) which composes different editing modules dynamically for various editing requirements.
no code implementations • 25 May 2022 • Linhui Dai, Hong Liu, Hao Tang, Zhiwei Wu, Pinhao Song
Comprehensive experiments on several challenging datasets show that our method achieves superior performance on the AOOD task.
no code implementations • 25 Apr 2022 • Gene-Ping Yang, Hao Tang
Attention mechanism in sequence-to-sequence models is designed to model the alignments between acoustic features and output tokens in speech recognition.
no code implementations • 2 Apr 2022 • Zeyong Wei, Honghua Chen, Hao Tang, Qian Xie, Mingqiang Wei, Jun Wang
The shape of circle is one of fundamental geometric primitives of man-made engineering objects.
1 code implementation • 29 Mar 2022 • Sung-Lin Yeh, Hao Tang
While several self-supervised approaches for learning discrete speech representation have been proposed, it is unclear how these seemingly similar approaches relate to each other.
1 code implementation • 24 Mar 2022 • Kai Zhang, Yawei Li, Jingyun Liang, JieZhang Cao, Yulun Zhang, Hao Tang, Radu Timofte, Luc van Gool
While recent years have witnessed a dramatic upsurge of exploiting deep neural networks toward solving image denoising, existing methods mostly rely on simple noise assumptions, such as additive white Gaussian noise (AWGN), JPEG compression noise and camera sensor noise, and a general-purpose blind denoising method for real images remains unsolved.
1 code implementation • 22 Mar 2022 • Songsong Wu, Hao Tang, Xiao-Yuan Jing, Haifeng Zhao, Jianjun Qian, Nicu Sebe, Yan Yan
In this paper, we tackle the problem of synthesizing a ground-view panorama image conditioned on a top-view aerial image, which is a challenging problem due to the large gap between the two image domains with different view-points.
1 code implementation • CVPR 2022 • Shanlin Sun, Kun Han, Deying Kong, Hao Tang, Xiangyi Yan, Xiaohui Xie
Recently DIFs-based methods have been proposed to handle shape reconstruction and dense point correspondences simultaneously, capturing semantic relationships across shapes of the same class by learning a DIFs-modeled shape template.
1 code implementation • ACL 2022 • Tom Hosking, Hao Tang, Mirella Lapata
We propose a generative model of paraphrase generation, that encourages syntactic diversity by conditioning on an explicit syntactic sketch.
Ranked #1 on
Paraphrase Generation
on MSCOCO
no code implementations • 4 Mar 2022 • Yu-Xin Jin, Jun-Jie Hu, Qi Li, Zhi-Cheng Luo, Fang-Yan Zhang, Hao Tang, Kun Qian, Xian-Min Jin
New COVID-19 epidemic strains like Delta and Omicron with increased transmissibility and pathogenicity emerge and spread across the whole world rapidly while causing high mortality during the pandemic period.
1 code implementation • 28 Feb 2022 • Hao Tang, Ling Shao, Philip H. S. Torr, Nicu Sebe
To learn more discriminative class-specific feature representations for the local generation, we also propose a novel classification module.
no code implementations • 25 Feb 2022 • Kun Han, Shanlin Sun, Xiangyi Yan, Chenyu You, Hao Tang, Junayed Naushad, Haoyu Ma, Deying Kong, Xiaohui Xie
Here we propose a new optimization-based method named DNVF (Diffeomorphic Image Registration with Neural Velocity Field) which utilizes deep neural network to model the space of admissible transformations.
no code implementations • 8 Feb 2022 • Yue Song, Hao Tang, Nicu Sebe, Wei Wang
Specifically, the detail modeling focuses on capturing the object edges by supervision of explicitly decomposed detail label that consists of the pixels that are nested on the edge and near the edge.
1 code implementation • 1 Feb 2022 • Guanglei Yang, Enrico Fini, Dan Xu, Paolo Rota, Mingli Ding, Hao Tang, Xavier Alameda-Pineda, Elisa Ricci
To fill this gap, in this paper we introduce a novel attentive feature distillation approach to mitigate catastrophic forgetting while accounting for semantic spatial- and channel-level dependencies.
no code implementations • CVPR 2022 • Zhenyu Zhang, Yanhao Ge, Ying Tai, Weijian Cao, Renwang Chen, Kunlin Liu, Hao Tang, Xiaoming Huang, Chengjie Wang, Zhifeng Xie, Dongjin Huang
This paper presents a novel Physically-guided Disentangled Implicit Rendering (PhyDIR) framework for high-fidelity 3D face modeling.
no code implementations • CVPR 2022 • Zhenyu Zhang, Yanhao Ge, Ying Tai, Xiaoming Huang, Chengjie Wang, Hao Tang, Dongjin Huang, Zhifeng Xie
In-the-wild 3D face modelling is a challenging problem as the predicted facial geometry and texture suffer from a lack of reliable clues or priors, when the input images are degraded.
1 code implementation • 27 Dec 2021 • Zhenglun Kong, Peiyan Dong, Xiaolong Ma, Xin Meng, Mengshu Sun, Wei Niu, Xuan Shen, Geng Yuan, Bin Ren, Minghai Qin, Hao Tang, Yanzhi Wang
Moreover, our framework can guarantee the identified model to meet resource specifications of mobile devices and FPGA, and even achieve the real-time execution of DeiT-T on mobile platforms.
1 code implementation • 14 Dec 2021 • Haoyu Chen, Hao Tang, Zitong Yu, Nicu Sebe, Guoying Zhao
Specifically, we propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies across the given meshes.
1 code implementation • 14 Dec 2021 • Yidi Li, Hong Liu, Hao Tang
Multi-modal fusion is proven to be an effective method to improve the accuracy and robustness of speaker tracking, especially in complex scenarios.
1 code implementation • 10 Dec 2021 • Nicola Dall'Asen, Yiming Wang, Hao Tang, Luca Zanella, Elisa Ricci
With the goal to maintain the geometric attributes of the source face, i. e., the facial pose and expression, and to promote more natural face generation, we propose to exploit a Bipartite Graph to explicitly model the relations between the facial landmarks of the source identity and the ones of the condition identity through a deep model.
1 code implementation • 2 Dec 2021 • Jichao Zhang, Enver Sangineto, Hao Tang, Aliaksandr Siarohin, Zhun Zhong, Nicu Sebe, Wei Wang
However, they usually struggle to generate high-quality images representing non-rigid objects, such as the human body, which is of a great interest for many computer graphics applications.
1 code implementation • CVPR 2022 • Zipeng Xu, Tianwei Lin, Hao Tang, Fu Li, Dongliang He, Nicu Sebe, Radu Timofte, Luc van Gool, Errui Ding
We propose a novel framework, i. e., Predict, Prevent, and Evaluate (PPE), for disentangled text-driven image manipulation that requires little manual annotation while being applicable to a wide variety of manipulations.
1 code implementation • CVPR 2022 • Wenhao Li, Hong Liu, Hao Tang, Pichao Wang, Luc van Gool
Estimating 3D human poses from monocular videos is a challenging task due to depth ambiguity and self-occlusion.
Ranked #8 on
3D Human Pose Estimation
on MPI-INF-3DHP
1 code implementation • 19 Nov 2021 • Guanglei Yang, Zhun Zhong, Hao Tang, Mingli Ding, Nicu Sebe, Elisa Ricci
Specifically, in the image translation stage, Bi-Mix leverages the knowledge of day-night image pairs to improve the quality of nighttime image relighting.
1 code implementation • 19 Nov 2021 • Guanglei Yang, Hao Tang, Humphrey Shi, Mingli Ding, Nicu Sebe, Radu Timofte, Luc van Gool, Elisa Ricci
The global alignment network aims to transfer the input image from the source domain to the target domain.
no code implementations • 25 Oct 2021 • Yijie Zhong, Bo Li, Lv Tang, Hao Tang, Shouhong Ding
With a lightweight basic convolution block, we build a two-stages framework: Segmentation Network (SN) is designed to capture sufficient semantics and classify the pixels into unknown, foreground and background regions; Matting Refine Network (MRN) aims at capturing detailed texture information and regressing accurate alpha values.
1 code implementation • 20 Oct 2021 • Haoyu Chen, Hao Tang, Nicu Sebe, Guoying Zhao
Instead, we introduce AniFormer, a novel Transformer-based architecture, that generates animated 3D sequences by directly taking the raw driving sequences and arbitrary same-type target meshes as inputs.
no code implementations • 20 Oct 2021 • Xiangyi Yan, Hao Tang, Shanlin Sun, Haoyu Ma, Deying Kong, Xiaohui Xie
One has to either downsample the image or use cropped local patches to reduce GPU memory usage, which limits its performance.
1 code implementation • 19 Oct 2021 • Bin Ren, Hao Tang, Nicu Sebe
To ease this problem, we propose a novel two-stage framework with a new Cascaded Cross MLP-Mixer (CrossMLP) sub-network in the first stage and one refined pixel-level loss in the second stage.
1 code implementation • 18 Oct 2021 • Haoyu Ma, Liangjian Chen, Deying Kong, Zhe Wang, Xingwei Liu, Hao Tang, Xiangyi Yan, Yusheng Xie, Shih-Yao Lin, Xiaohui Xie
The 3D position encoding guided by the epipolar field provides an efficient way of encoding correspondences between pixels of different views.
Ranked #14 on
3D Human Pose Estimation
on Human3.6M
(using extra training data)
no code implementations • 29 Sep 2021 • Zhenglun Kong, Peiyan Dong, Xiaolong Ma, Xin Meng, Mengshu Sun, Wei Niu, Bin Ren, Minghai Qin, Hao Tang, Yanzhi Wang
Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult.
no code implementations • 29 Sep 2021 • Haoyu Ma, Yifan Huang, Tianlong Chen, Hao Tang, Chenyu You, Zhangyang Wang, Xiaohui Xie
However, it is unclear why the distorted distribution of the logits is catastrophic to the student model.
no code implementations • 29 Sep 2021 • Fan-Yun Sun, Jonathan Kuck, Hao Tang, Stefano Ermon
Several indices used in a factor graph data structure can be permuted without changing the underlying probability distribution.
no code implementations • EMNLP (insights) 2021 • Ramon Sanabria, Hao Tang, Sharon Goldwater
Word segmentation, the problem of finding word boundaries in speech, is of interest for a range of tasks.
1 code implementation • 1 Sep 2021 • Hao Tang, Guoshuai Zhao, Yuxia Wu, Xueming Qian
Therefore, we propose a Multi-Sample based Contrastive Loss (MSCL) function which solves the two problems by balancing the importance of positive and negative samples and data augmentation.
1 code implementation • 29 Aug 2021 • Hao Tang, Nicu Sebe
In this paper, we address the task of layout-to-image translation, which aims to translate an input semantic layout to a realistic image.
1 code implementation • ICCV 2021 • Haoyu Chen, Hao Tang, Henglin Shi, Wei Peng, Nicu Sebe, Guoying Zhao
With the strength of deep generative models, 3D pose transfer regains intensive research interests in recent years.
1 code implementation • ICCV 2021 • Hao Tang, Xingwei Liu, Shanlin Sun, Xiangyi Yan, Xiaohui Xie
Although having achieved great success in medical image segmentation, deep convolutional neural networks usually require a large dataset with manual annotations for training and are difficult to generalize to unseen classes.
no code implementations • 7 Jul 2021 • Gaowen Liu, Hao Tang, Hugo Latapie, Jason Corso, Yan Yan
Particularly, we propose a novel Bi-directional Spatial Temporal Attention Fusion Generative Adversarial Network (STA-GAN) to learn both spatial and temporal information to generate egocentric video sequences from the exocentric view.
1 code implementation • 29 Jun 2021 • Lei Ding, Dong Lin, Shaofu Lin, Jing Zhang, Xiaojie Cui, Yuebin Wang, Hao Tang, Lorenzo Bruzzone
To overcome this limitation, we propose a Wide-Context Network (WiCoNet) for the semantic segmentation of HR RSIs.
1 code implementation • 21 Jun 2021 • Hao Tang, Nicu Sebe
Both generators are mutually connected and trained in an end-to-end fashion and explicitly form three cycled subnets, i. e., one image generation cycle and two guidance generation cycles.
1 code implementation • 31 May 2021 • Jichao Zhang, Aliaksandr Siarohin, Hao Tang, Enver Sangineto, Wei Wang, Humphrey Sh, Nicu Sebe
Moreover, we propose a novel Self-Training Part Replacement (STPR) strategy to refine the model for the texture-transfer task, which improves the quality of the generated clothes and the preservation ability of non-target regions.
1 code implementation • 28 May 2021 • Guanglei Yang, Hao Tang, Zhun Zhong, Mingli Ding, Ling Shao, Nicu Sebe, Elisa Ricci
In this paper, we study the task of source-free domain adaptation (SFDA), where the source data are not available during target adaptation.
1 code implementation • 12 Apr 2021 • Bin Ren, Hao Tang, Fanyang Meng, Runwei Ding, Ling Shao, Philip H. S. Torr, Nicu Sebe
2D image-based virtual try-on has attracted increased attention from the multimedia and computer vision communities.
1 code implementation • ICCV 2021 • Guanglei Yang, Hao Tang, Mingli Ding, Nicu Sebe, Elisa Ricci
While convolutional neural networks have shown a tremendous impact on various computer vision tasks, they generally demonstrate limitations in explicitly modeling long-range dependencies due to the intrinsic locality of the convolution operation.
Ranked #6 on
Depth Estimation
on NYU-Depth V2
1 code implementation • 22 Feb 2021 • Lei Ding, Hao Tang, Yahui Liu, Yilei Shi, Xiao Xiang Zhu, Lorenzo Bruzzone
To address this issue, we propose an adversarial shape learning network (ASLNet) to model the building shape patterns that improve the accuracy of building segmentation.
no code implementations • 1 Jan 2021 • Hao Tang, Nicu Sebe
We propose the Semantically-Adaptive UpSampling (SA-UpSample), a general and highly effective upsampling method for the layout-to-image translation task.
no code implementations • 16 Dec 2020 • Hao Tang, Xingwei Liu, Kun Han, Shanlin Sun, Narisu Bai, Xuming Chen, Huang Qian, Yong liu, Xiaohui Xie
State-of-the-art CNN segmentation models apply either 2D or 3D convolutions on input images, with pros and cons associated with each method: 2D convolution is fast, less memory-intensive but inadequate for extracting 3D contextual information from volumetric images, while the opposite is true for 3D convolution.
no code implementations • NeurIPS 2020 • Hao Tang, Zhiao Huang, Jiayuan Gu, Bao-liang Lu, Hao Su
Current graph neural networks (GNNs) lack generalizability with respect to scales (graph sizes, graph diameters, edge weights, etc..) when solving many graph analysis problems.
1 code implementation • 26 Oct 2020 • Hao Tang, Zhiao Huang, Jiayuan Gu, Bao-liang Lu, Hao Su
Current graph neural networks (GNNs) lack generalizability with respect to scales (graph sizes, graph diameters, edge weights, etc..) when solving many graph analysis problems.
no code implementations • NeurIPS 2020 • Tongzhou Mu, Jiayuan Gu, Zhiwei Jia, Hao Tang, Hao Su
We study how to learn a policy with compositional generalizability.
1 code implementation • 29 Aug 2020 • Hao Tang, Song Bai, Nicu Sebe
We also propose two novel modules, i. e., position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention Module (CAM), to capture semantic structure attention in spatial and channel dimensions, respectively.
no code implementations • 14 Aug 2020 • Bin Duan, Hao Tang, Wei Wang, Ziliang Zong, Guowei Yang, Yan Yan
Recent works have shown that attention mechanism is beneficial to the fusion process.
3 code implementations • CVPR 2022 • Ming Tao, Hao Tang, Fei Wu, Xiao-Yuan Jing, Bing-Kun Bao, Changsheng Xu
To these ends, we propose a simpler but more effective Deep Fusion Generative Adversarial Networks (DF-GAN).
Ranked #4 on
Text-to-Image Generation
on CUB
1 code implementation • 10 Aug 2020 • Hao Tang, Song Bai, Philip H. S. Torr, Nicu Sebe
We present a novel Bipartite Graph Reasoning GAN (BiGraphGAN) for the challenging person image generation task.
Ranked #1 on
Pose Transfer
on Market-1501
(PCKh metric)
1 code implementation • 9 Aug 2020 • Jichao Zhang, Jingjing Chen, Hao Tang, Wei Wang, Yan Yan, Enver Sangineto, Nicu Sebe
In this paper we address the problem of unsupervised gaze correction in the wild, presenting a solution that works without the need for precise annotations of the gaze angle and the head pose.
no code implementations • 6 Aug 2020 • Hao Tang, Anurag Pal, Lu-Feng Qiao, Tian-Yu Wang, Jun Gao, Xian-Min Jin
Collateralized debt obligation (CDO) has been one of the most commonly used structured financial products and is intensively studied in quantitative finance.
no code implementations • 20 Jul 2020 • Hao Ding, Songsong Wu, Hao Tang, Fei Wu, Guangwei Gao, Xiao-Yuan Jing
This is even more laborious when generating images with very different views.
2 code implementations • ECCV 2020 • Hao Tang, Song Bai, Li Zhang, Philip H. S. Torr, Nicu Sebe
We propose a novel Generative Adversarial Network (XingGAN or CrossingGAN) for person image generation tasks, i. e., translating the pose of a given person to a desired one.
Ranked #1 on
Pose Transfer
on Market-1501
(IS metric)
no code implementations • ACL 2020 • Qiji Zhou, Yue Zhang, Donghong Ji, Hao Tang
Abstract Meaning Representations (AMRs) capture sentence-level semantics structural representations to broad-coverage natural sentences.
no code implementations • NeurIPS 2020 • Jonathan Kuck, Shuvam Chakraborty, Hao Tang, Rachel Luo, Jiaming Song, Ashish Sabharwal, Stefano Ermon
Learned neural solvers have successfully been used to solve combinatorial optimization and decision problems.
no code implementations • ACL 2020 • Hao Tang, Donghong Ji, Chenliang Li, Qiji Zhou
The idea is to allow the dependency graph to guide the representation learning of the transformer encoder and vice versa.
1 code implementation • 21 May 2020 • Hao Tang, Hong Liu, Wei Xiao, Nicu Sebe
Then the activated dictionary atoms are assembled and passed to the compound dictionary learning and coding layers.
no code implementations • 20 May 2020 • Xinya Chen, Yanrui Bin, Changxin Gao, Nong Sang, Hao Tang
The module builds a fully connected directed graph between the regions of different density where each node (region) is represented by weighted global pooled feature, and GCN is learned to map this region graph to a set of relation-aware regions representations.
2 code implementations • 17 May 2020 • Yu-An Chung, Hao Tang, James Glass
Autoregressive Predictive Coding (APC), as a self-supervised objective, has enjoyed success in learning representations from large amounts of unlabeled data, and the learned representations are rich for many downstream tasks.
1 code implementation • 31 Mar 2020 • Hao Tang, Xiaojuan Qi, Guolei Sun, Dan Xu, Nicu Sebe, Radu Timofte, Luc van Gool
To tackle 1), we propose to use edge as an intermediate representation which is further adopted to guide image generation via a proposed attention guided edge transfer module.
no code implementations • 8 Feb 2020 • Gaowen Liu, Hao Tang, Hugo Latapie, Yan Yan
In this paper, we investigate exocentric (third-person) view to egocentric (first-person) view image generation.
1 code implementation • 3 Feb 2020 • Hao Tang, Philip H. S. Torr, Nicu Sebe
In the first stage, the input image and the conditional semantic guidance are fed into a cycled semantic-guided generation network to produce initial coarse results.
no code implementations • 13 Jan 2020 • Shanlin Sun, Yang Liu, Narisu Bai, Hao Tang, Xuming Chen, Qian Huang, Yong liu, Xiaohui Xie
Organs-at-risk (OAR) delineation in computed tomography (CT) is an important step in Radiation Therapy (RT) planning.
2 code implementations • CVPR 2020 • Hao Tang, Dan Xu, Yan Yan, Philip H. S. Torr, Nicu Sebe
To tackle this issue, in this work we consider learning the scene generation in a local context, and correspondingly design a local class-specific generative network with semantic maps as a guidance, which separately constructs and learns sub-generators concentrating on the generation of different classes, and is able to provide more scene details.
1 code implementation • 14 Dec 2019 • Hao Tang, Dan Xu, Hong Liu, Nicu Sebe
In this paper, we analyze the limitation of the existing symmetric GAN models in asymmetric translation tasks, and propose an AsymmetricGAN model with both translation and reconstruction generators of unequal sizes and different parameter-sharing strategy to adapt to the asymmetric need in both unsupervised and supervised image-to-image translation tasks.
1 code implementation • 12 Dec 2019 • Hao Tang, Hong Liu, Nicu Sebe
The proposed model consists of a single generator and a discriminator taking a conditional image and the target controllable structure as input.
Ranked #1 on
Cross-View Image-to-Image Translation
on cvusa
Facial Expression Translation
Gesture-to-Gesture Translation
+2
2 code implementations • 27 Nov 2019 • Hao Tang, Hong Liu, Dan Xu, Philip H. S. Torr, Nicu Sebe
State-of-the-art methods in image-to-image translation are capable of learning a mapping from a source domain to a target domain with unpaired image data.
Ranked #1 on
Facial Expression Translation
on CelebA
no code implementations • 20 Nov 2019 • Lei Ding, Hao Tang, Lorenzo Bruzzone
High-level features extracted from the late layers of a neural network are rich in semantic information, yet have blurred spatial details; low-level features extracted from the early layers of a network contain more pixel-level information, but are isolated and noisy.
1 code implementation • 2 Aug 2019 • Hao Tang, Dan Xu, Gaowen Liu, Wei Wang, Nicu Sebe, Yan Yan
In this work, we propose a novel Cycle In Cycle Generative Adversarial Network (C$^2$GAN) for the task of keypoint-guided image generation.
1 code implementation • 25 Jul 2019 • Hao Tang, Chupeng Zhang, Xiaohui Xie
Pulmonary nodule detection, false positive reduction and segmentation represent three of the most common tasks in the computeraided analysis of chest CT images.
1 code implementation • 3 Jul 2019 • Bin Duan, Wei Wang, Hao Tang, Hugo Latapie, Yan Yan
However, in machine learning, this cross-modal learning is a nontrivial task because different modalities have no homogeneous properties.
1 code implementation • ACL 2019 • Sharmistha Jat, Hao Tang, Partha Talukdar, Tom Mitchell
To the best of our knowledge, this is the first work showing that the MEG brain recording when reading a word in a sentence can be used to distinguish earlier words in the sentence.
no code implementations • arXiv 2019 • Jichao Zhang, Meng Sun, Jingjing Chen, Hao Tang, Yan Yan, Xueying Qin, Nicu Sebe
Gaze correction aims to redirect the person's gaze into the camera by manipulating the eye region, and it can be considered as a specific image resynthesis problem.
no code implementations • 14 May 2019 • Hao Tang, Wei Wang, Songsong Wu, Xinya Chen, Dan Xu, Nicu Sebe, Yan Yan
In this paper, we focus on the facial expression translation task and propose a novel Expression Conditional GAN (ECGAN) which can learn the mapping from one image domain to another one based on an additional expression attribute.
Facial expression generation
Facial Expression Translation
+1
no code implementations • 11 May 2019 • Achintya kr. Sarkar, Zheng-Hua Tan, Hao Tang, Suwon Shon, James Glass
There are a number of studies about extraction of bottleneck (BN) features from deep neural networks (DNNs)trained to discriminate speakers, pass-phrases and triphone states for improving the performance of text-dependent speaker verification (TD-SV).
no code implementations • 11 May 2019 • Songsong Wu, Yan Yan, Hao Tang, Jianjun Qian, Jian Zhang, Xiao-Yuan Jing
However, the number of labeled source samples are always limited due to expensive annotation cost in practice, making sub-optimal performance been observed.
no code implementations • 11 May 2019 • Songsong Wu, Zhiqiang Lu, Hao Tang, Yan Yan, Songhao Zhu, Xiao-Yuan Jing, Zuoyong Li
Multi-view subspace clustering aims to divide a set of multisource data into several groups according to their underlying subspace structure.
3 code implementations • CVPR 2019 • Hao Tang, Dan Xu, Nicu Sebe, Yanzhi Wang, Jason J. Corso, Yan Yan
In this paper, we propose a novel approach named Multi-Channel Attention SelectionGAN (SelectionGAN) that makes it possible to generate images of natural scenes in arbitrary viewpoints, based on an image of the scene and a novel semantic map.
Bird View Synthesis
Cross-View Image-to-Image Translation
+1
no code implementations • 7 Apr 2019 • Suwon Shon, Hao Tang, James Glass
In this paper, we propose VoiceID loss, a novel loss function for training a speech enhancement model to improve the robustness of speaker verification.
5 code implementations • 5 Apr 2019 • Yu-An Chung, Wei-Ning Hsu, Hao Tang, James Glass
This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations.
8 code implementations • 28 Mar 2019 • Hao Tang, Dan Xu, Nicu Sebe, Yan Yan
To handle the limitation, in this paper we propose a novel Attention-Guided Generative Adversarial Network (AGGAN), which can detect the most discriminative semantic object and minimize changes of unwanted part for semantic manipulation problems without using extra data and models.
Ranked #1 on
Facial Expression Translation
on Bu3dfe
no code implementations • 23 Mar 2019 • Hao Tang, Xingwei Liu, Xiaohui Xie
Most of the existing deep learning nodule detection systems are constructed in two steps: a) nodule candidates screening and b) false positive reduction, using two different models trained separately.
no code implementations • 23 Mar 2019 • Hao Tang, Daniel R. Kim, Xiaohui Xie
Finally, we introduce a method to ensemble models from both stages via consensus to give the final predictions.
1 code implementation • 23 Mar 2019 • Hao Tang, Chupeng Zhang, Xiaohui Xie
To validate the robustness and performance of our proposed framework trained with a small number of training examples, we further tested our model on CT scans from an independent dataset.
1 code implementation • 31 Jan 2019 • Greg Olmschenk, Hao Tang, Zhigang Zhu
Gatherings of thousands to millions of people frequently occur for an enormous variety of events, and automated counting of these high-density crowds is useful for safety, management, and measuring significance of an event.
1 code implementation • 28 Jan 2019 • Hao Tang, Xinya Chen, Wei Wang, Dan Xu, Jason J. Corso, Nicu Sebe, Yan Yan
To this end, we propose a novel Attribute-Guided Sketch Generative Adversarial Network (ASGAN) which is an end-to-end framework and contains two pairs of generators and discriminators, one of which is used to generate faces with attributes while the other one is employed for image-to-sketch translation.
1 code implementation • 15 Jan 2019 • Hao Tang, Hong Liu, Wei Xiao, Nicu Sebe
Gesture recognition is a hot topic in computer vision and pattern recognition, which plays a vitally important role in natural human-computer interface.
Ranked #1 on
Hand Gesture Recognition
on Cambridge
1 code implementation • 14 Jan 2019 • Hao Tang, Dan Xu, Wei Wang, Yan Yan, Nicu Sebe
State-of-the-art methods for image-to-image translation with Generative Adversarial Networks (GANs) can learn a mapping from one domain to another domain using unpaired image data.
no code implementations • 27 Nov 2018 • Greg Olmschenk, Zhigang Zhu, Hao Tang
We first demonstrate the capabilities of semi-supervised regression GANs on a toy dataset which allows for a detailed understanding of how they operate in various circumstances.
no code implementations • 31 Oct 2018 • Hao Tang, James Glass
In addition, we study three types of inductive bias, leveraging a pronunciation dictionary, word boundary annotations, and constraints on word durations.
1 code implementation • 12 Sep 2018 • Suwon Shon, Hao Tang, James Glass
In this paper, we propose a Convolutional Neural Network (CNN) based speaker recognition model for extracting robust speaker embeddings.
1 code implementation • 11 Sep 2018 • Hao Tang, Heng Wei, Wei Xiao, Wei Wang, Dan Xu, Yan Yan, Nicu Sebe
In this paper, we propose a novel Deep Micro-Dictionary Learning and Coding Network (DDLCN).
no code implementations • 26 Aug 2018 • Ci-Yu Wang, Jun Gao, Zhi-Qiang Jiao, Lu-Feng Qiao, Ruo-Jing Ren, Zhen Feng, Yuan Chen, Zeng-Quan Yan, Yao Wang, Hao Tang, Xian-Min Jin
Quantum key distribution (QKD), harnessing quantum physics and optoelectronics, may promise unconditionally secure information exchange in theory.
Quantum Physics
1 code implementation • 14 Aug 2018 • Hao Tang, Wei Wang, Dan Xu, Yan Yan, Nicu Sebe
Therefore, this task requires a high-level understanding of the mapping between the input source gesture and the output target gesture.
Ranked #1 on
Gesture-to-Gesture Translation
on NTU Hand Digit
no code implementations • 9 Jul 2018 • Hao Tang, James Glass
In this paper, we study recurrent networks' ability to learn long-term dependency in the context of speech recognition.
no code implementations • 13 Jun 2018 • Hao Tang, Wei-Ning Hsu, Francois Grondin, James Glass
Speech recognizers trained on close-talking speech do not generalize to distant speech and the word error rate degradation can be as large as 40% absolute.
no code implementations • 13 Jun 2018 • Wei-Ning Hsu, Hao Tang, James Glass
However, it is relatively inexpensive to collect large amounts of unlabeled data from domains that we want the models to generalize to.
1 code implementation • CVPR 2018 • Dan Xu, Wei Wang, Hao Tang, Hong Liu, Nicu Sebe, Elisa Ricci
Recent works have shown the benefit of integrating Conditional Random Fields (CRFs) models into deep architectures for improving pixel-level prediction tasks.
no code implementations • 5 Sep 2017 • Hao Tang
We explore end-to-end training for segmental models with various loss functions, and show how end-to-end training with marginal log loss can eliminate the need for detailed manual alignments.
no code implementations • 1 Aug 2017 • Hao Tang, Liang Lu, Lingpeng Kong, Kevin Gimpel, Karen Livescu, Chris Dyer, Noah A. Smith, Steve Renals
Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time.
no code implementations • 5 Apr 2017 • Shubham Toshniwal, Hao Tang, Liang Lu, Karen Livescu
We hypothesize that using intermediate representations as auxiliary supervision at lower levels of deep networks may be a good way of combining the advantages of end-to-end training and more traditional pipeline approaches.
no code implementations • 21 Oct 2016 • Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu
Similarly to hybrid HMM-neural network models, segmental models of this class can be trained in two stages (frame classifier training followed by linear segmental model weight training), end to end (joint training of both frame classifier and linear weights), or with end-to-end fine-tuning after two-stage training.
no code implementations • 26 Sep 2016 • Taehwan Kim, Jonathan Keane, Weiran Wang, Hao Tang, Jason Riggle, Gregory Shakhnarovich, Diane Brentari, Karen Livescu
Recognizing fingerspelling is challenging for a number of reasons: It involves quick, small motions that are often highly coarticulated; it exhibits significant variation between signers; and there has been a dearth of continuous fingerspelling data collected.
no code implementations • 2 Aug 2016 • Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu
Discriminative segmental models offer a way to incorporate flexible feature functions into speech recognition.
no code implementations • 13 Feb 2016 • Taehwan Kim, Weiran Wang, Hao Tang, Karen Livescu
Previous work has shown that it is possible to achieve almost 90% accuracies on fingerspelling recognition in a signer-dependent setting.
no code implementations • 22 Jul 2015 • Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu
A typical solution is to use approximate decoding, either by beam pruning in a single pass or by beam pruning to generate a lattice followed by a second pass.