Search Results for author: Hao Tang

Found 214 papers, 110 papers with code

StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion

2 code implementations • 9 Apr 2024 • Ming Tao, Bing-Kun Bao, Hao Tang, YaoWei Wang, Changsheng Xu

3) The story visualization and continuation models are trained and inferred independently, which is not user-friendly.

Image Generation Story Visualization

144

Paper
Code

HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud

2 code implementations • 4 Apr 2024 • Wencan Cheng, Hao Tang, Luc van Gool, Jong Hwan Ko

Extracting keypoint locations from input hand frames, known as 3D hand pose estimation, is a critical task in various human-computer interaction applications.

3D Hand Pose Estimation

144

Paper
Code

Towards Robust 3D Pose Transfer with Adversarial Learning

no code implementations • 2 Apr 2024 • Haoyu Chen, Hao Tang, Ehsan Adeli, Guoying Zhao

This work is driven by the intuition that the robustness of the model can be enhanced by introducing adversarial samples into the training, leading to a more invulnerable model to the noisy inputs, which even can be further extended to directly handling the real-world data like raw point clouds/scans without intermediate processing.

3D Generation Pose Transfer

Paper
Add Code

On the Faithfulness of Vision Transformer Explanations

no code implementations • 1 Apr 2024 • Junyi Wu, Weitai Kang, Hao Tang, Yuan Hong, Yan Yan

In contrast, our proposed SaCo offers a reliable faithfulness measurement, establishing a robust metric for interpretations.

Paper
Add Code

Versatile Navigation under Partial Observability via Value-guided Diffusion Policy

no code implementations • 1 Apr 2024 • Gengyu Zhang, Hao Tang, Yan Yan

To address these deficiencies, we propose a versatile diffusion-based approach for both 2D and 3D route planning under partial observability.

Autonomous Driving Semantic Segmentation

Paper
Add Code

Learning with Unreliability: Fast Few-shot Voxel Radiance Fields with Relative Geometric Consistency

no code implementations • 26 Mar 2024 • YingJie Xu, Bangzhen Liu, Hao Tang, Bailin Deng, Shengfeng He

We propose a voxel-based optimization framework, ReVoRF, for few-shot radiance fields that strategically address the unreliability in pseudo novel view synthesis.

Novel View Synthesis

Paper
Add Code

Rotate to Scan: UNet-like Mamba with Triplet SSM Module for Medical Image Segmentation

no code implementations • 26 Mar 2024 • Hao Tang, Lianglun Cheng, Guoheng Huang, Zhengguang Tan, Junhao Lu, Kaihong Wu

Image segmentation holds a vital position in the realms of diagnosis and treatment within the medical domain.

Image Segmentation Medical Image Segmentation +1

Paper
Add Code

Towards Online Real-Time Memory-based Video Inpainting Transformers

no code implementations • 24 Mar 2024 • Guillaume Thiry, Hao Tang, Radu Timofte, Luc van Gool

Video inpainting tasks have seen significant improvements in recent years with the rise of deep neural networks and, in particular, vision transformers.

Video Inpainting

Paper
Add Code

Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

no code implementations • 21 Mar 2024 • Junyi Wu, Bin Duan, Weitai Kang, Hao Tang, Yan Yan

To incorporate the influence of token transformation into interpretation, we propose TokenTM, a novel post-hoc explanation method that utilizes our introduced measurement of token transformation effects.

Paper
Add Code

MaskSAM: Towards Auto-prompt SAM with Mask Classification for Medical Image Segmentation

no code implementations • 21 Mar 2024 • Bin Xie, Hao Tang, Bin Duan, Dawen Cai, Yan Yan

Each pair of auxiliary mask and box prompts, which can solve the requirements of extra prompts, is associated with class label predictions by the sum of the auxiliary classifier token and the learnable global classifier tokens in the mask decoder of SAM to solve the predictions of semantic labels.

Image Segmentation Medical Image Segmentation +2

Paper
Add Code

Efficient Pruning of Large Language Model with Adaptive Estimation Fusion

no code implementations • 16 Mar 2024 • Jun Liu, Chao Wu, Changdi Yang, Hao Tang, Haoye Dong, Zhenglun Kong, Geng Yuan, Wei Niu, Dong Huang, Yanzhi Wang

Large language models (LLMs) have become crucial for many generative downstream tasks, leading to an inevitable trend and significant challenge to deploy them efficiently on resource-constrained devices.

Language Modelling Large Language Model

Paper
Add Code

StableGarment: Garment-Centric Generation via Stable Diffusion

no code implementations • 16 Mar 2024 • Rui Wang, Hailong Guo, Jiaming Liu, Huaxia Li, Haibo Zhao, Xu Tang, Yao Hu, Hao Tang, Peipei Li

In this paper, we introduce StableGarment, a unified framework to tackle garment-centric(GC) generation tasks, including GC text-to-image, controllable GC text-to-image, stylized GC text-to-image, and robust virtual try-on.

Denoising Image Generation +1

Paper
Add Code

GiT: Towards Generalist Vision Transformer through Universal Language Interface

2 code implementations • 14 Mar 2024 • Haiyang Wang, Hao Tang, Li Jiang, Shaoshuai Shi, Muhammad Ferjad Naeem, Hongsheng Li, Bernt Schiele, LiWei Wang

Due to its simple design, this paradigm holds promise for narrowing the architectural gap between vision and language.

Language Modelling

205

Paper
Code

SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior

no code implementations • 14 Mar 2024 • Huan-ang Gao, Mingju Gao, Jiaju Li, Wenyi Li, Rong Zhi, Hao Tang, Hao Zhao

Semantic image synthesis (SIS) shows good promises for sensor simulation.

Image Generation

Paper
Add Code

Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM

1 code implementation • 12 Mar 2024 • Zeyu Zhang, Akide Liu, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang

Human motion generation stands as a significant pursuit in generative computer vision, while achieving long-sequence and efficient motion generation remains challenging.

Paper
Code

InstructGIE: Towards Generalizable Image Editing

no code implementations • 8 Mar 2024 • Zichong Meng, Changdi Yang, Jun Liu, Hao Tang, Pu Zhao, Yanzhi Wang

In response to this challenge, our study introduces a novel image editing framework with enhanced generalization robustness by boosting in-context learning capability and unifying language instruction.

Denoising In-Context Learning

Paper
Add Code

Hierarchical Indexing for Retrieval-Augmented Opinion Summarization

1 code implementation • 1 Mar 2024 • Tom Hosking, Hao Tang, Mirella Lapata

We show that HIRO learns an encoding space that is more semantically structured than prior work, and generates summaries that are more representative of the opinions in the input reviews.

Opinion Summarization Retrieval

Paper
Code

WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment

no code implementations • 19 Feb 2024 • Hao Tang, Darren Key, Kevin Ellis

We give a model-based agent that builds a Python program representing its knowledge of the world based on its interactions with the environment.

Program Synthesis

Paper
Add Code

Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge

no code implementations • 14 Feb 2024 • Jiancheng Yang, Rui Shi, Liang Jin, Xiaoyang Huang, Kaiming Kuang, Donglai Wei, Shixuan Gu, Jianying Liu, PengFei Liu, Zhizhong Chai, Yongjie Xiao, Hao Chen, Liming Xu, Bang Du, Xiangyi Yan, Hao Tang, Adam Alessio, Gregory Holste, Jiapeng Zhang, Xiaoming Wang, Jianye He, Lixuan Che, Hanspeter Pfister, Ming Li, Bingbing Ni

The resulting FracNet+ demonstrates competitive performance in rib fracture detection, which lays a foundation for further research and development in AI-assisted rib fracture detection and diagnosis.

Instance Segmentation Semantic Segmentation

Paper
Add Code

ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization

no code implementations • 17 Jan 2024 • Weiyao Wang, Pierre Gleize, Hao Tang, Xingyu Chen, Kevin J Liang, Matt Feiszli

Neural Radiance Fields (NeRF) exhibit remarkable performance for Novel View Synthesis (NVS) given a set of 2D images.

Novel View Synthesis

Paper
Add Code

Graph Transformer GANs with Graph Masked Modeling for Architectural Layout Generation

no code implementations • 15 Jan 2024 • Hao Tang, Ling Shao, Nicu Sebe, Luc van Gool

Finally, we propose a novel self-guided pre-training method for graph representation learning.

Generative Adversarial Network Graph Representation Learning +1

Paper
Add Code

SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation

1 code implementation • 26 Dec 2023 • Yuxuan Zhang, Yiren Song, Jiaming Liu, Rui Wang, Jinpeng Yu, Hao Tang, Huaxia Li, Xu Tang, Yao Hu, Han Pan, Zhongliang Jing

Recent advancements in subject-driven image generation have led to zero-shot generation, yet precise selection and focus on crucial subject representations remain challenging.

Image Generation

Paper
Code

Enlighten-Your-Voice: When Multimodal Meets Zero-shot Low-light Image Enhancement

1 code implementation • 15 Dec 2023 • Xiaofeng Zhang, Zishan Xu, Hao Tang, Chaochen Gu, Wei Chen, Shanying Zhu, Xinping Guan

Low-light image enhancement is a crucial visual task, and many unsupervised methods tend to overlook the degradation of visible information in low-light scenes, which adversely affects the fusion of complementary information and hinders the generation of satisfactory results.

Low-Light Image Enhancement

Paper
Code

Efficient-NeRF2NeRF: Streamlining Text-Driven 3D Editing with Multiview Correspondence-Enhanced Diffusion Models

no code implementations • 13 Dec 2023 • Liangchen Song, Liangliang Cao, Jiatao Gu, Yifan Jiang, Junsong Yuan, Hao Tang

In this work, we propose that by incorporating correspondence regularization into diffusion models, the process of 3D editing can be significantly accelerated.

Paper
Add Code

Learning Contrastive Self-Distillation for Ultra-Fine-Grained Visual Categorization Targeting Limited Samples

no code implementations • 10 Nov 2023 • Ziye Fang, Xin Jiang, Hao Tang, Zechao Li

In the field of intelligent multimedia analysis, ultra-fine-grained visual categorization (Ultra-FGVC) plays a vital role in distinguishing intricate subcategories within broader categories.

Contrastive Learning Fine-Grained Visual Categorization

Paper
Add Code

Multi-view Information Integration and Propagation for Occluded Person Re-identification

1 code implementation • 7 Nov 2023 • Neng Dong, Shuanglin Yan, Hao Tang, Jinhui Tang, Liyan Zhang

Moreover, as multiple images with the same identity are not accessible in the testing stage, we devise an Information Propagation (IP) mechanism to distill knowledge from the comprehensive representation to that of a single occluded image.

Person Re-Identification

Paper
Code

Towards High-quality HDR Deghosting with Conditional Diffusion Models

no code implementations • 2 Nov 2023 • Qingsen Yan, Tao Hu, Yuan Sun, Hao Tang, Yu Zhu, Wei Dong, Luc van Gool, Yanning Zhang

To address this challenge, we formulate the HDR deghosting problem as an image generation that leverages LDR features as the diffusion model's condition, consisting of the feature condition generator and the noise predictor.

Denoising Image Generation

Paper
Add Code

Towards Matching Phones and Speech Representations

no code implementations • 26 Oct 2023 • Gene-Ping Yang, Hao Tang

We study two key properties that enable matching, namely, whether cluster centroids of self-supervised representations reduce the variability of phone instances and respect the relationship among phones.

Self-Supervised Learning

Paper
Add Code

Blind quantum machine learning with quantum bipartite correlator

no code implementations • 19 Oct 2023 • Changhao Li, Boning Li, Omar Amer, Ruslan Shaydulin, Shouvanik Chakrabarti, Guoqing Wang, Haowei Xu, Hao Tang, Isidor Schoch, Niraj Kumar, Charles Lim, Ju Li, Paola Cappellaro, Marco Pistoia

Privacy in distributed quantum computing is critical for maintaining confidentiality and protecting the data in the presence of untrusted computing nodes.

Privacy Preserving Quantum Machine Learning

Paper
Add Code

Pedestrian Accessible Infrastructure Inventory: Assessing Zero-Shot Segmentation on Multi-Mode Geospatial Data for All Pedestrian Types

no code implementations • 15 Oct 2023 • Jiahao Xia, Gavin Gong, Jiawei Liu, Zhigang Zhu, Hao Tang

In this paper, a Segment Anything Model (SAM)-based pedestrian infrastructure segmentation workflow is designed and optimized, which is capable of efficiently processing multi-sourced geospatial data including LiDAR data and satellite imagery data.

Segmentation Zero Shot Segmentation

Paper
Add Code

Does Graph Distillation See Like Vision Dataset Counterpart?

2 code implementations • NeurIPS 2023 • Beining Yang, Kai Wang, Qingyun Sun, Cheng Ji, Xingcheng Fu, Hao Tang, Yang You, JianXin Li

We validate the proposed SGDD across 9 datasets and achieve state-of-the-art results on all of them: for example, on the YelpChi dataset, our approach maintains 98. 6% test accuracy of training on the original graph dataset with 1, 000 times saving on the scale of the graph.

Anomaly Detection Graph Representation Learning +1

Paper
Code

Efficient-3DiM: Learning a Generalizable Single-image Novel-view Synthesizer in One Day

1 code implementation • 4 Oct 2023 • Yifan Jiang, Hao Tang, Jen-Hao Rick Chang, Liangchen Song, Zhangyang Wang, Liangliang Cao

Although the fidelity and generalizability are greatly improved, training such a powerful diffusion model requires a vast volume of training data and model parameters, resulting in a notoriously long time and high computational costs.

Image Generation Novel View Synthesis

Paper
Code

Distilling ODE Solvers of Diffusion Models into Smaller Steps

no code implementations • 28 Sep 2023 • Sanghwan Kim, Hao Tang, Fisher Yu

Notably, our method incurs negligible computational overhead compared to previous distillation techniques, facilitating straightforward and rapid integration with existing samplers.

Denoising Knowledge Distillation

Paper
Add Code

Light Field Diffusion for Single-View Novel View Synthesis

no code implementations • 20 Sep 2023 • Yifeng Xiong, Haoyu Ma, Shanlin Sun, Kun Han, Hao Tang, Xiaohui Xie

Starting from the camera pose matrices, LFD transforms them into light field encoding, with the same shape as the reference image, to describe the direction of each ray.

Denoising Novel View Synthesis +1

Paper
Add Code

Delving into Multimodal Prompting for Fine-grained Visual Classification

no code implementations • 16 Sep 2023 • Xin Jiang, Hao Tang, Junyao Gao, Xiaoyu Du, Shengfeng He, Zechao Li

In this paper, we aim to fully exploit the capabilities of cross-modal description to tackle FGVC tasks and propose a novel multimodal prompting solution, denoted as MP-FGVC, based on the contrastive language-image pertaining (CLIP) model.

Classification Fine-Grained Image Classification

Paper
Add Code

Temporal-aware Hierarchical Mask Classification for Video Semantic Segmentation

1 code implementation • 14 Sep 2023 • Zhaochong An, Guolei Sun, Zongwei Wu, Hao Tang, Luc van Gool

Modern approaches have proved the huge potential of addressing semantic segmentation as a mask classification task which is widely used in instance-level segmentation.

Classification Segmentation +2

Paper
Code

Adapting Segment Anything Model for Change Detection in HR Remote Sensing Images

1 code implementation • 4 Sep 2023 • Lei Ding, Kun Zhu, Daifeng Peng, Hao Tang, Kuiwu Yang, Lorenzo Bruzzone

In this work, we aim to utilize the strong visual recognition capabilities of VFMs to improve the change detection of high-resolution Remote Sensing Images (RSIs).

Change Detection Interactive Segmentation

109

Paper
Code

UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation

2 code implementations • ICCV 2023 • Haiyang Wang, Hao Tang, Shaoshuai Shi, Aoxue Li, Zhenguo Li, Bernt Schiele, LiWei Wang

Jointly processing information from multiple sensors is crucial to achieving accurate and robust perception for reliable autonomous driving systems.

Ranked #8 on 3D Object Detection on nuScenes

3D Object Detection Autonomous Driving +2

327

Paper
Code

M$^3$Net: Multi-view Encoding, Matching, and Fusion for Few-shot Fine-grained Action Recognition

no code implementations • 6 Aug 2023 • Hao Tang, Jun Liu, Shuanglin Yan, Rui Yan, Zechao Li, Jinhui Tang

Due to the scarcity of manually annotated data required for fine-grained video understanding, few-shot fine-grained (FS-FG) action recognition has gained significant attention, with the aim of classifying novel fine-grained action categories with only a few labeled instances.

Decision Making Fine-grained Action Recognition +1

Paper
Add Code

Interactive Neural Painting

no code implementations • 31 Jul 2023 • Elia Peruzzo, Willi Menapace, Vidit Goel, Federica Arrigoni, Hao Tang, Xingqian Xu, Arman Chopikyan, Nikita Orlov, Yuxiao Hu, Humphrey Shi, Nicu Sebe, Elisa Ricci

This paper advances the state of the art in this emerging research domain by proposing the first approach for Interactive NP.

Paper
Add Code

Hybrid-CSR: Coupling Explicit and Implicit Shape Representation for Cortical Surface Reconstruction

no code implementations • 23 Jul 2023 • Shanlin Sun, Thanh-Tung Le, Chenyu You, Hao Tang, Kun Han, Haoyu Ma, Deying Kong, Xiangyi Yan, Xiaohui Xie

We present Hybrid-CSR, a geometric deep-learning model that combines explicit and implicit shape representations for cortical surface reconstruction.

Surface Reconstruction

Paper
Add Code

Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic Image Synthesis

1 code implementation • 22 Jul 2023 • Hao Tang, Guolei Sun, Nicu Sebe, Luc van Gool

To tackle 2), we design an effective module to selectively highlight class-dependent feature maps according to the original semantic layout to preserve the semantic information.

Contrastive Learning Image Generation

Paper
Code

Erasing, Transforming, and Noising Defense Network for Occluded Person Re-Identification

1 code implementation • 14 Jul 2023 • Neng Dong, Liyan Zhang, Shuanglin Yan, Hao Tang, Jinhui Tang

Occlusion perturbation presents a significant challenge in person re-identification (re-ID), and existing methods that rely on external visual cues require additional computational resources and only consider the issue of missing information caused by occlusion.

Adversarial Defense Person Re-Identification

Paper
Code

Inter-Instance Similarity Modeling for Contrastive Learning

1 code implementation • 21 Jun 2023 • Chengchao Shen, Dawei Liu, Hao Tang, Zhe Qu, Jianxin Wang

In this paper, we propose a novel image mix method, PatchMix, for contrastive learning in Vision Transformer (ViT), to model inter-instance similarities among images.

Contrastive Learning Instance Segmentation +4

Paper
Code

Enlighten Anything: When Segment Anything Model Meets Low-Light Image Enhancement

2 code implementations • 17 Jun 2023 • Qihan Zhao, Xiaofeng Zhang, Hao Tang, Chaochen Gu, Shanying Zhu

Image restoration is a low-level visual task, and most CNN methods are designed as black boxes, lacking transparency and intrinsic aesthetics.

Image Restoration Low-Light Image Enhancement +2

104

Paper
Code

Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling

no code implementations • 3 Jun 2023 • Ramon Sanabria, Ondrej Klejch, Hao Tang, Sharon Goldwater

Acoustic word embeddings are typically created by training a pooling function using pairs of word-like units.

Word Embeddings

Paper
Add Code

Edge-guided Representation Learning for Underwater Object Detection

no code implementations • 1 Jun 2023 • Linhui Dai, Hong Liu, Pinhao Song, Hao Tang, Runwei Ding, Shengquan Li

The key to addressing these challenges is to focus the model on obtaining more discriminative information.

Object object-detection +2

Paper
Add Code

Reversible Graph Neural Network-based Reaction Distribution Learning for Multiple Appropriate Facial Reactions Generation

1 code implementation • 24 May 2023 • Tong Xu, Micol Spitale, Hao Tang, Lu Liu, Hatice Gunes, Siyang Song

This means that we approach this problem by considering the generation of a distribution of the listener's appropriate facial reactions instead of multiple different appropriate facial reactions, i. e., 'many' appropriate facial reaction labels are summarised as 'one' distribution label during training.

Paper
Code

Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces

no code implementations • 21 May 2023 • Oli Liu, Hao Tang, Sharon Goldwater

Self-supervised speech representations are known to encode both speaker and phonetic information, but how they are distributed in the high-dimensional space remains largely unexplored.

Disentanglement

Paper
Add Code

Attributable and Scalable Opinion Summarization

1 code implementation • 19 May 2023 • Tom Hosking, Hao Tang, Mirella Lapata

We propose a method for unsupervised opinion summarization that encodes sentences from customer reviews into a hierarchical discrete latent space, then identifies common opinions based on the frequency of their encodings.

Ranked #1 on Unsupervised Opinion Summarization on SPACE (Opinion Summarization)

Opinion Summarization Unsupervised Opinion Summarization

Paper
Code

Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer

no code implementations • CVPR 2023 • Hao Tang, Songhua Liu, Tianwei Lin, Shaoli Huang, Fu Li, Dongliang He, Xinchao Wang

On the other hand, different from the vanilla version, we adopt a learnable scaling operation on content features before content-style feature interaction, which better preserves the original similarity between a pair of content features while ensuring the stylization quality.

Meta-Learning Style Transfer

Paper
Add Code

SMAE: Few-shot Learning for HDR Deghosting with Saturation-Aware Masked Autoencoders

no code implementations • CVPR 2023 • Qingsen Yan, Song Zhang, Weiye Chen, Hao Tang, Yu Zhu, Jinqiu Sun, Luc van Gool, Yanning Zhang

In this work, we propose a novel semi-supervised approach to realize few-shot HDR imaging via two stages of training, called SSHDR.

Few-Shot Learning Pseudo Label

Paper
Add Code

Localized Region Contrast for Enhancing Self-Supervised Learning in Medical Image Segmentation

no code implementations • 6 Apr 2023 • Xiangyi Yan, Junayed Naushad, Chenyu You, Hao Tang, Shanlin Sun, Kun Han, Haoyu Ma, James Duncan, Xiaohui Xie

In this paper, we propose a novel contrastive learning framework that integrates Localized Region Contrast (LRC) to enhance existing self-supervised pre-training methods for medical image segmentation.

Contrastive Learning Image Segmentation +5

Paper
Add Code

Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration

no code implementations • CVPR 2023 • Guofeng Mei, Hao Tang, Xiaoshui Huang, Weijie Wang, Juan Liu, Jian Zhang, Luc van Gool, Qiang Wu

Deep point cloud registration methods face challenges to partial overlaps and rely on labeled data.

Point Cloud Registration

Paper
Add Code

Graph Transformer GANs for Graph-Constrained House Generation

no code implementations • CVPR 2023 • Hao Tang, Zhenyu Zhang, Humphrey Shi, Bo Li, Ling Shao, Nicu Sebe, Radu Timofte, Luc van Gool

We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations in an end-to-end fashion for the challenging graph-constrained house generation task.

Generative Adversarial Network House Generation +1

Paper
Add Code

Analysis and Evaluation of Explainable Artificial Intelligence on Suicide Risk Assessment

no code implementations • 9 Mar 2023 • Hao Tang, Aref Miri Rekavandi, Dharjinder Rooprai, Girish Dwivedi, Frank Sanfilippo, Farid Boussaid, Mohammed Bennamoun

This study investigates the effectiveness of Explainable Artificial Intelligence (XAI) techniques in predicting suicide risks and identifying the dominant causes for such behaviours.

Data Augmentation Decision Making +2

Paper
Add Code

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

1 code implementation • CVPR 2023 • Xuan Shen, Yaohua Wang, Ming Lin, Yilun Huang, Hao Tang, Xiuyu Sun, Yanzhi Wang

To this end, a novel framework termed Mathematical Architecture Design for Deep CNN (DeepMAD) is proposed to design high-performance CNN models in a principled way.

Ranked #1 on Neural Architecture Search on ImageNet

Image Classification Neural Architecture Search

344

Paper
Code

Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased

1 code implementation • 3 Feb 2023 • Chao Yu, Jiaxuan Gao, Weilin Liu, Botian Xu, Hao Tang, Jiaqi Yang, Yu Wang, Yi Wu

A crucial limitation of this framework is that every policy in the pool is optimized w. r. t.

Multi-agent Reinforcement Learning

Paper
Code

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

2 code implementations • CVPR 2023 • Ming Tao, Bing-Kun Bao, Hao Tang, Changsheng Xu

The complex scene understanding ability of CLIP enables the discriminator to accurately assess the image quality.

Ranked #3 on Text-to-Image Generation on CUB

Scene Understanding Text-to-Image Generation

283

Paper
Code

Bipartite Graph Diffusion Model for Human Interaction Generation

1 code implementation • 24 Jan 2023 • Baptiste Chopin, Hao Tang, Mohamed Daoudi

The generation of natural human motion interactions is a hot topic in computer vision and computer animation.

Paper
Code

Pruning Parameterization With Bi-Level Optimization for Efficient Semantic Segmentation on the Edge

no code implementations • CVPR 2023 • Changdi Yang, Pu Zhao, Yanyu Li, Wei Niu, Jiexiong Guan, Hao Tang, Minghai Qin, Bin Ren, Xue Lin, Yanzhi Wang

With the ever-increasing popularity of edge devices, it is necessary to implement real-time segmentation on the edge for autonomous driving and many other applications.

Autonomous Driving Segmentation +1

Paper
Add Code

Learning Concordant Attention via Target-aware Alignment for Visible-Infrared Person Re-identification

no code implementations • ICCV 2023 • Jianbing Wu, Hong Liu, Yuxin Su, Wei Shi, Hao Tang

Owing to the large distribution gap between the heterogeneous data in Visible-Infrared Person Re-identification (VI Re-ID), we point out that existing paradigms often suffer from the inter-modal semantic misalignment issue and thus fail to align and compare local details properly.

Cross-Modal Retrieval Person Re-Identification +1

Paper
Add Code

HOTCOLD Block: Fooling Thermal Infrared Detectors with a Novel Wearable Design

1 code implementation • 12 Dec 2022 • Hui Wei, Zhixiang Wang, Xuemei Jia, Yinqiang Zheng, Hao Tang, Shin'ichi Satoh, Zheng Wang

Adversarial attacks on thermal infrared imaging expose the risk of related applications.

Adversarial Attack

Paper
Code

Few-shot Medical Image Segmentation with Cycle-resemblance Attention

no code implementations • 7 Dec 2022 • Hao Ding, Changchang Sun, Hao Tang, Dawen Cai, Yan Yan

Recently, due to the increasing requirements of medical imaging applications and the professional requirements of annotating medical images, few-shot learning has gained increasing attention in the medical image semantic segmentation field.

Few-Shot Learning Image Segmentation +4

Paper
Add Code

Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training

1 code implementation • 19 Nov 2022 • Zhenglun Kong, Haoyu Ma, Geng Yuan, Mengshu Sun, Yanyue Xie, Peiyan Dong, Xin Meng, Xuan Shen, Hao Tang, Minghai Qin, Tianlong Chen, Xiaolong Ma, Xiaohui Xie, Zhangyang Wang, Yanzhi Wang

Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization.

Paper
Code

MelHuBERT: A simplified HuBERT on Mel spectrograms

1 code implementation • 17 Nov 2022 • Tzu-Quan Lin, Hung-Yi Lee, Hao Tang

Self-supervised models have had great success in learning speech representations that can generalize to various downstream tasks.

Automatic Speech Recognition Self-Supervised Learning +3

Paper
Code

Compressing Transformer-based self-supervised models for speech processing

1 code implementation • 17 Nov 2022 • Tzu-Quan Lin, Tsung-Huan Yang, Chun-Yao Chang, Kuang-Ming Chen, Tzu-hsun Feng, Hung-Yi Lee, Hao Tang

Despite the success of Transformers in self- supervised learning with applications to various downstream tasks, the computational cost of training and inference remains a major challenge for applying these models to a wide spectrum of devices.

Knowledge Distillation Model Compression +1

Paper
Code

Deep Unsupervised Key Frame Extraction for Efficient Video Classification

no code implementations • 12 Nov 2022 • Hao Tang, Lei Ding, Songsong Wu, Bin Ren, Nicu Sebe, Paolo Rota

The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically.

Classification Video Classification

Paper
Add Code

Bipartite Graph Reasoning GANs for Person Pose and Facial Image Synthesis

1 code implementation • 12 Nov 2022 • Hao Tang, Ling Shao, Philip H. S. Torr, Nicu Sebe

To further capture the change in pose of each part more precisely, we propose a novel part-aware bipartite graph reasoning (PBGR) block to decompose the task of reasoning the global structure transformation with a bipartite graph into learning different local transformations for different semantic body/face parts.

Generative Adversarial Network Image Generation

128

Paper
Code

Data Level Lottery Ticket Hypothesis for Vision Transformers

1 code implementation • 2 Nov 2022 • Xuan Shen, Zhenglun Kong, Minghai Qin, Peiyan Dong, Geng Yuan, Xin Meng, Hao Tang, Xiaolong Ma, Yanzhi Wang

That is, there exists a subset of input image patches such that a ViT can be trained from scratch by using only this subset of patches and achieve similar accuracy to the ViTs trained by using all image patches.

Analogical Similarity Informativeness

Paper
Code

Learning Dependencies of Discrete Speech Representations with Neural Hidden Markov Models

no code implementations • 29 Oct 2022 • Sung-Lin Yeh, Hao Tang

While discrete latent variable models have had great success in self-supervised learning, most models assume that frames are independent.

Self-Supervised Learning

Paper
Add Code

Analyzing Acoustic Word Embeddings from Pre-trained Self-supervised Speech Models

no code implementations • 28 Oct 2022 • Ramon Sanabria, Hao Tang, Sharon Goldwater

Given the strong results of self-supervised models on various tasks, there have been surprisingly few studies exploring self-supervised representations for acoustic word embeddings (AWE), fixed-dimensional vectors representing variable-length spoken word segments.

Word Embeddings

Paper
Add Code

Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution

1 code implementation • 27 Oct 2022 • Chin-Yun Yu, Sung-Lin Yeh, György Fazekas, Hao Tang

Moreover, by coupling the proposed sampling method with an unconditional DM, i. e., a DM with no auxiliary inputs to its noise predictor, we can generalize it to a wide range of SR setups.

Super-Resolution

Paper
Code

ADPS: Asymmetric Distillation Post-Segmentation for Image Anomaly Detection

no code implementations • 19 Oct 2022 • Peng Xing, Hao Tang, Jinhui Tang, Zechao Li

However, existing KDAD methods suffer from two main limitations: 1) the student network can effortlessly replicate the teacher network's representations, and 2) the features of the teacher network serve solely as a ``reference standard" and are not fully leveraged.

Anomaly Detection Knowledge Distillation

Paper
Add Code

On Compressing Sequences for Self-Supervised Speech Models

no code implementations • 13 Oct 2022 • Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe, Paola Garcia, Hung-Yi Lee, Hao Tang

Subsampling while training self-supervised models not only improves the overall performance on downstream tasks under certain frame rates, but also brings significant speed-up in inference.

Self-Supervised Learning

Paper
Add Code

SiNeRF: Sinusoidal Neural Radiance Fields for Joint Pose Estimation and Scene Reconstruction

1 code implementation • 10 Oct 2022 • Yitong Xia, Hao Tang, Radu Timofte, Luc van Gool

NeRFmm is the Neural Radiance Fields (NeRF) that deal with Joint Optimization tasks, i. e., reconstructing real-world scenes and registering camera parameters simultaneously.

Image Generation Pose Estimation

Paper
Code

Boosting Few-shot Fine-grained Recognition with Background Suppression and Foreground Alignment

1 code implementation • 4 Oct 2022 • Zican Zha, Hao Tang, Yunlian Sun, Jinhui Tang

To address this challenging task, we propose a two-stage background suppression and foreground alignment framework, which is composed of a background activation suppression (BAS) module, a foreground object alignment (FOA) module, and a local-to-local (L2L) similarity metric.

Few-Shot Learning

Paper
Code

Physical Adversarial Attack meets Computer Vision: A Decade Survey

1 code implementation • 30 Sep 2022 • Hui Wei, Hao Tang, Xuemei Jia, Zhixiang Wang, Hanxun Yu, Zhubo Li, Shin'ichi Satoh, Luc van Gool, Zheng Wang

Building upon this foundation, we uncover the pervasive role of artifacts carrying adversarial perturbations in the physical world.

Adversarial Attack Medical Diagnosis

Paper
Code

PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation

1 code implementation • 16 Sep 2022 • Haoyu Ma, Zhe Wang, Yifei Chen, Deying Kong, Liangjian Chen, Xingwei Liu, Xiangyi Yan, Hao Tang, Xiaohui Xie

In this paper, we propose the token-Pruned Pose Transformer (PPT) for 2D human pose estimation, which can locate a rough human mask and performs self-attention only within selected tokens.

Ranked #17 on 3D Human Pose Estimation on Human3.6M (using extra training data)

2D Human Pose Estimation 3D Human Pose Estimation

Paper
Code

Facial Expression Translation using Landmark Guided GANs

1 code implementation • 5 Sep 2022 • Hao Tang, Nicu Sebe

We propose a simple yet powerful Landmark guided Generative Adversarial Network (LandmarkGAN) for the facial expression-to-expression translation using a single image, which is an important and challenging task in computer vision since the expression-to-expression translation is a non-linear and non-aligned problem.

Facial Expression Translation Generative Adversarial Network +1

Paper
Code

Image-Specific Information Suppression and Implicit Local Alignment for Text-based Person Search

no code implementations • 30 Aug 2022 • Shuanglin Yan, Hao Tang, Liyan Zhang, Jinhui Tang

Moreover, existing methods seldom consider the information inequality problem between modalities caused by image-specific information.

Person Search Text based Person Search

Paper
Add Code

Training and Tuning Generative Neural Radiance Fields for Attribute-Conditional 3D-Aware Face Generation

1 code implementation • 26 Aug 2022 • Jichao Zhang, Aliaksandr Siarohin, Yahui Liu, Hao Tang, Nicu Sebe, Wei Wang

Generative Neural Radiance Fields (GNeRF) based 3D-aware GANs have demonstrated remarkable capabilities in generating high-quality images while maintaining strong 3D consistency.

Attribute Disentanglement +2

Paper
Code

Identity-Sensitive Knowledge Propagation for Cloth-Changing Person Re-identification

1 code implementation • 25 Aug 2022 • Jianbing Wu, Hong Liu, Wei Shi, Hao Tang, Jingwen Guo

To mitigate the resolution degradation issue and mine identity-sensitive cues from human faces, we propose to restore the missing facial details using prior facial knowledge, which is then propagated to a smaller network.

Cloth-Changing Person Re-Identification Human Parsing

Paper
Code

G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model

no code implementations • 19 Aug 2022 • Pan Xie, Qipeng Zhang, Taiyi Peng, Hao Tang, Yao Du, Zexian Li

Our approach focuses on the transformation of sign gloss sequences into their corresponding sign pose sequences (G2P).

Denoising Quantization +1

Paper
Add Code

Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization

no code implementations • 10 Aug 2022 • Zhengang Li, Mengshu Sun, Alec Lu, Haoyu Ma, Geng Yuan, Yanyue Xie, Hao Tang, Yanyu Li, Miriam Leeser, Zhangyang Wang, Xue Lin, Zhenman Fang

Compared with state-of-the-art ViT quantization work (algorithmic approach only without hardware acceleration), our quantization achieves 0. 47% to 1. 36% higher Top-1 accuracy under the same bit-width.

Quantization

Paper
Add Code

Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution

1 code implementation • 25 Jul 2022 • Yushu Wu, Yifan Gong, Pu Zhao, Yanyu Li, Zheng Zhan, Wei Niu, Hao Tang, Minghai Qin, Bin Ren, Yanzhi Wang

Instead of measuring the speed on mobile devices at each iteration during the search process, a speed model incorporated with compiler optimizations is leveraged to predict the inference latency of the SR block with various width configurations for faster convergence.

Neural Architecture Search SSIM +1

Paper
Code

Mining Relations among Cross-Frame Affinities for Video Semantic Segmentation

1 code implementation • 21 Jul 2022 • Guolei Sun, Yun Liu, Hao Tang, Ajad Chhatkuli, Le Zhang, Luc van Gool

The essence of video semantic segmentation (VSS) is how to leverage temporal information for prediction.

Optical Flow Estimation Semantic Segmentation +1

Paper
Code

Towards Interpretable Video Super-Resolution via Alternating Optimization

1 code implementation • 21 Jul 2022 • JieZhang Cao, Jingyun Liang, Kai Zhang, Wenguan Wang, Qin Wang, Yulun Zhang, Hao Tang, Luc van Gool

These issues can be alleviated by a cascade of three separate sub-tasks, including video deblurring, frame interpolation, and super-resolution, which, however, would fail to capture the spatial and temporal correlations among video sequences.

Deblurring Space-time Video Super-resolution +2

Paper
Code

MLP-GAN for Brain Vessel Image Segmentation

no code implementations • 17 Jul 2022 • Bin Xie, Hao Tang, Bin Duan, Dawen Cai, Yan Yan

Brain vessel image segmentation can be used as a promising biomarker for better prevention and treatment of different diseases.

Generative Adversarial Network Image Segmentation +2

Paper
Add Code

RCRN: Real-world Character Image Restoration Network via Skeleton Extraction

no code implementations • 16 Jul 2022 • Daqian Shi, Xiaolei Diao, Hao Tang, Xiaomin Li, Hao Xing, Hao Xu

SENet aims to preserve the structural consistency of the character and normalize complex noise.

Image Restoration

Paper
Add Code

CharFormer: A Glyph Fusion based Attentive Framework for High-precision Character Image Denoising

1 code implementation • 16 Jul 2022 • Daqian Shi, Xiaolei Diao, Lida Shi, Hao Tang, Yang Chi, Chuntao Li, Hao Xu

Degraded images commonly exist in the general sources of character images, leading to unsatisfactory character recognition results.

Image Denoising

Paper
Code

RZCR: Zero-shot Character Recognition via Radical-based Reasoning

no code implementations • 12 Jul 2022 • Xiaolei Diao, Daqian Shi, Hao Tang, Qiang Shen, Yanzeng Li, Lei Wu, Hao Xu

The long-tail effect is a common issue that limits the performance of deep learning models on real-world datasets.

Paper
Add Code

PI-Trans: Parallel-ConvMLP and Implicit-Transformation Based GAN for Cross-View Image Translation

1 code implementation • 9 Jul 2022 • Bin Ren, Hao Tang, Yiming Wang, Xia Li, Wei Wang, Nicu Sebe

For semantic-guided cross-view image translation, it is crucial to learn where to sample pixels from the source view image and where to reallocate them guided by the target view semantic map, especially when there is little overlap or drastic view difference between the source and target images.

Generative Adversarial Network

Paper
Code

Contrastive Learning from Spatio-Temporal Mixed Skeleton Sequences for Self-Supervised Skeleton-Based Action Recognition

1 code implementation • 7 Jul 2022 • Zhan Chen, Hong Liu, Tianyu Guo, Zhengyan Chen, Pinhao Song, Hao Tang

First, SkeleMix utilizes the topological information of skeleton data to mix two skeleton sequences by randomly combing the cropped skeleton fragments (the trimmed view) with the remaining skeleton sequences (the truncated view).

Action Recognition Contrastive Learning +2

Paper
Code

Interaction Transformer for Human Reaction Generation

1 code implementation • 4 Jul 2022 • Baptiste Chopin, Hao Tang, Naima Otberdout, Mohamed Daoudi, Nicu Sebe

To address this limitation, we propose a novel interaction Transformer (InterFormer) consisting of a Transformer network with both temporal and spatial attention.

Paper
Code

Unsupervised High-Resolution Portrait Gaze Correction and Animation

1 code implementation • 1 Jul 2022 • Jichao Zhang, Jingjing Chen, Hao Tang, Enver Sangineto, Peng Wu, Yan Yan, Nicu Sebe, Wei Wang

Solving this problem using an unsupervised method remains an open problem, especially for high-resolution face images in the wild, which are not easy to annotate with gaze and head pose labels.

Image Inpainting Vocal Bursts Intensity Prediction

190

Paper
Code

3D-Aware Video Generation

1 code implementation • 29 Jun 2022 • Sherwin Bahmani, Jeong Joon Park, Despoina Paschalidou, Hao Tang, Gordon Wetzstein, Leonidas Guibas, Luc van Gool, Radu Timofte

Generative models have emerged as an essential building block for many image synthesis and editing tasks.

Image Generation Video Generation

Paper
Code

From Perception to Programs: Regularize, Overparameterize, and Amortize

no code implementations • 13 Jun 2022 • Hao Tang, Kevin Ellis

Toward combining inductive reasoning with perception abilities, we develop techniques for neurosymbolic program synthesis where perceptual input is first parsed by neural nets into a low-dimensional interpretable representation, which is then processed by a synthesized program.

Program Synthesis

Paper
Add Code

GraphMLP: A Graph MLP-Like Architecture for 3D Human Pose Estimation

1 code implementation • 13 Jun 2022 • Wenhao Li, Hong Liu, Tianyu Guo, Runwei Ding, Hao Tang

To the best of our knowledge, this is the first MLP-Like architecture for 3D human pose estimation in a single frame and a video sequence.

Ranked #53 on 3D Human Pose Estimation on Human3.6M

3D Human Pose Estimation Representation Learning

Paper
Code

Medical Image Registration via Neural Fields

no code implementations • 7 Jun 2022 • Shanlin Sun, Kun Han, Hao Tang, Deying Kong, Junayed Naushad, Xiangyi Yan, Xiaohui Xie

Traditional methods for image registration are primarily optimization-driven, finding the optimal deformations that maximize the similarity between two images.

Image Registration Medical Image Registration +1

Paper
Add Code

Real-Time Portrait Stylization on the Edge

no code implementations • 2 Jun 2022 • Yanyu Li, Xuan Shen, Geng Yuan, Jiexiong Guan, Wei Niu, Hao Tang, Bin Ren, Yanzhi Wang

In this work we demonstrate real-time portrait stylization, specifically, translating self-portrait into cartoon or anime style on mobile devices.

Paper
Add Code

DE-Net: Dynamic Text-guided Image Editing Adversarial Networks

1 code implementation • 2 Jun 2022 • Ming Tao, Bing-Kun Bao, Hao Tang, Fei Wu, Longhui Wei, Qi Tian

To solve these limitations, we propose: (i) a Dynamic Editing Block (DEBlock) which composes different editing modules dynamically for various editing requirements.

text-guided-image-editing

Paper
Code

AO2-DETR: Arbitrary-Oriented Object Detection Transformer

1 code implementation • 25 May 2022 • Linhui Dai, Hong Liu, Hao Tang, Zhiwei Wu, Pinhao Song

Comprehensive experiments on several challenging datasets show that our method achieves superior performance on the AOOD task.

Inductive Bias Object +4

Paper
Code

Supervised Attention in Sequence-to-Sequence Models for Speech Recognition

no code implementations • 25 Apr 2022 • Gene-Ping Yang, Hao Tang

Attention mechanism in sequence-to-sequence models is designed to model the alignments between acoustic features and output tokens in speech recognition.

speech-recognition Speech Recognition

Paper
Add Code

Deep Algebraic Fitting for Multiple Circle Primitives Extraction from Raw Point Clouds

no code implementations • 2 Apr 2022 • Zeyong Wei, Honghua Chen, Hao Tang, Qian Xie, Mingqiang Wei, Jun Wang

The shape of circle is one of fundamental geometric primitives of man-made engineering objects.

Paper
Add Code

Autoregressive Co-Training for Learning Discrete Speech Representations

1 code implementation • 29 Mar 2022 • Sung-Lin Yeh, Hao Tang

While several self-supervised approaches for learning discrete speech representation have been proposed, it is unclear how these seemingly similar approaches relate to each other.

Quantization

Paper
Code

Practical Blind Image Denoising via Swin-Conv-UNet and Data Synthesis

2 code implementations • 24 Mar 2022 • Kai Zhang, Yawei Li, Jingyun Liang, JieZhang Cao, Yulun Zhang, Hao Tang, Deng-Ping Fan, Radu Timofte, Luc van Gool

While recent years have witnessed a dramatic upsurge of exploiting deep neural networks toward solving image denoising, existing methods mostly rely on simple noise assumptions, such as additive white Gaussian noise (AWGN), JPEG compression noise and camera sensor noise, and a general-purpose blind denoising method for real images remains unsolved.

Ranked #1 on Image Denoising on urban100 sigma15

Image Denoising Image-to-Image Translation

583

Paper
Code

Cross-View Panorama Image Synthesis

1 code implementation • 22 Mar 2022 • Songsong Wu, Hao Tang, Xiao-Yuan Jing, Haifeng Zhao, Jianjun Qian, Nicu Sebe, Yan Yan

In this paper, we tackle the problem of synthesizing a ground-view panorama image conditioned on a top-view aerial image, which is a challenging problem due to the large gap between the two image domains with different view-points.

Image Generation

Paper
Code

Topology-Preserving Shape Reconstruction and Registration via Neural Diffeomorphic Flow

1 code implementation • CVPR 2022 • Shanlin Sun, Kun Han, Deying Kong, Hao Tang, Xiangyi Yan, Xiaohui Xie

Recently DIFs-based methods have been proposed to handle shape reconstruction and dense point correspondences simultaneously, capturing semantic relationships across shapes of the same class by learning a DIFs-modeled shape template.

Organ Segmentation Template Matching

Paper
Code

Hierarchical Sketch Induction for Paraphrase Generation

1 code implementation • ACL 2022 • Tom Hosking, Hao Tang, Mirella Lapata

We propose a generative model of paraphrase generation, that encourages syntactic diversity by conditioning on an explicit syntactic sketch.

Ranked #1 on Paraphrase Generation on Paralex

Paraphrase Generation Sentence

Paper
Code

Quantum Deep Learning for Mutant COVID-19 Strain Prediction

no code implementations • 4 Mar 2022 • Yu-Xin Jin, Jun-Jie Hu, Qi Li, Zhi-Cheng Luo, Fang-Yan Zhang, Hao Tang, Kun Qian, Xian-Min Jin

New COVID-19 epidemic strains like Delta and Omicron with increased transmissibility and pathogenicity emerge and spread across the whole world rapidly while causing high mortality during the pandemic period.

Paper
Add Code

Local and Global GANs with Semantic-Aware Upsampling for Image Generation

1 code implementation • 28 Feb 2022 • Hao Tang, Ling Shao, Philip H. S. Torr, Nicu Sebe

To learn more discriminative class-specific feature representations for the local generation, we also propose a novel classification module.

Feature Upsampling Image Generation

145

Paper
Code

Diffeomorphic Image Registration with Neural Velocity Field

no code implementations • 25 Feb 2022 • Kun Han, Shanlin Sun, Xiangyi Yan, Chenyu You, Hao Tang, Junayed Naushad, Haoyu Ma, Deying Kong, Xiaohui Xie

Here we propose a new optimization-based method named DNVF (Diffeomorphic Image Registration with Neural Velocity Field) which utilizes deep neural network to model the space of admissible transformations.

Image Registration

Paper
Add Code

Disentangle Saliency Detection into Cascaded Detail Modeling and Body Filling

no code implementations • 8 Feb 2022 • Yue Song, Hao Tang, Nicu Sebe, Wei Wang

Specifically, the detail modeling focuses on capturing the object edges by supervision of explicitly decomposed detail label that consists of the pixels that are nested on the edge and near the edge.

object-detection Object Detection +2

Paper
Add Code

Continual Attentive Fusion for Incremental Learning in Semantic Segmentation

1 code implementation • 1 Feb 2022 • Guanglei Yang, Enrico Fini, Dan Xu, Paolo Rota, Mingli Ding, Hao Tang, Xavier Alameda-Pineda, Elisa Ricci

To fill this gap, in this paper we introduce a novel attentive feature distillation approach to mitigate catastrophic forgetting while accounting for semantic spatial- and channel-level dependencies.

Incremental Learning Semantic Segmentation

Paper
Code

Physically-Guided Disentangled Implicit Rendering for 3D Face Modeling

no code implementations • CVPR 2022 • Zhenyu Zhang, Yanhao Ge, Ying Tai, Weijian Cao, Renwang Chen, Kunlin Liu, Hao Tang, Xiaoming Huang, Chengjie Wang, Zhifeng Xie, Dongjin Huang

This paper presents a novel Physically-guided Disentangled Implicit Rendering (PhyDIR) framework for high-fidelity 3D face modeling.

3D Face Modelling Neural Rendering

Paper
Add Code

Learning To Restore 3D Face From In-the-Wild Degraded Images

no code implementations • CVPR 2022 • Zhenyu Zhang, Yanhao Ge, Ying Tai, Xiaoming Huang, Chengjie Wang, Hao Tang, Dongjin Huang, Zhifeng Xie

In-the-wild 3D face modelling is a challenging problem as the predicted facial geometry and texture suffer from a lack of reliable clues or priors, when the input images are degraded.

3D Face Modelling Face Reconstruction

Paper
Add Code

SPViT: Enabling Faster Vision Transformers via Soft Token Pruning

1 code implementation • 27 Dec 2021 • Zhenglun Kong, Peiyan Dong, Xiaolong Ma, Xin Meng, Mengshu Sun, Wei Niu, Xuan Shen, Geng Yuan, Bin Ren, Minghai Qin, Hao Tang, Yanzhi Wang

Moreover, our framework can guarantee the identified model to meet resource specifications of mobile devices and FPGA, and even achieve the real-time execution of DeiT-T on mobile platforms.

Ranked #4 on Efficient ViTs on ImageNet-1K (with DeiT-S)

Efficient ViTs Model Compression

Paper
Code

Geometry-Contrastive Transformer for Generalized 3D Pose Transfer

1 code implementation • 14 Dec 2021 • Haoyu Chen, Hao Tang, Zitong Yu, Nicu Sebe, Guoying Zhao

Specifically, we propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies across the given meshes.

Pose Transfer

Paper
Code

Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking

1 code implementation • 14 Dec 2021 • Yidi Li, Hong Liu, Hao Tang

Multi-modal fusion is proven to be an effective method to improve the accuracy and robustness of speaker tracking, especially in complex scenarios.

Self-Supervised Learning

Paper
Code

Graph-based Generative Face Anonymisation with Pose Preservation

1 code implementation • 10 Dec 2021 • Nicola Dall'Asen, Yiming Wang, Hao Tang, Luca Zanella, Elisa Ricci

With the goal to maintain the geometric attributes of the source face, i. e., the facial pose and expression, and to promote more natural face generation, we propose to exploit a Bipartite Graph to explicitly model the relations between the facial landmarks of the source identity and the ones of the condition identity through a deep model.

Face Detection Face Generation

Paper
Code

3D-Aware Semantic-Guided Generative Model for Human Synthesis

1 code implementation • 2 Dec 2021 • Jichao Zhang, Enver Sangineto, Hao Tang, Aliaksandr Siarohin, Zhun Zhong, Nicu Sebe, Wei Wang

However, they usually struggle to generate high-quality images representing non-rigid objects, such as the human body, which is of a great interest for many computer graphics applications.

3D-Aware Image Synthesis

Paper
Code

Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model

1 code implementation • CVPR 2022 • Zipeng Xu, Tianwei Lin, Hao Tang, Fu Li, Dongliang He, Nicu Sebe, Radu Timofte, Luc van Gool, Errui Ding

We propose a novel framework, i. e., Predict, Prevent, and Evaluate (PPE), for disentangled text-driven image manipulation that requires little manual annotation while being applicable to a wide variety of manipulations.

Image Manipulation Language Modelling

Paper
Code

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

1 code implementation • CVPR 2022 • Wenhao Li, Hong Liu, Hao Tang, Pichao Wang, Luc van Gool

Estimating 3D human poses from monocular videos is a challenging task due to depth ambiguity and self-occlusion.

Ranked #22 on 3D Human Pose Estimation on MPI-INF-3DHP

3D Human Pose Estimation

494

Paper
Code

Bi-Mix: Bidirectional Mixing for Domain Adaptive Nighttime Semantic Segmentation

1 code implementation • 19 Nov 2021 • Guanglei Yang, Zhun Zhong, Hao Tang, Mingli Ding, Nicu Sebe, Elisa Ricci

Specifically, in the image translation stage, Bi-Mix leverages the knowledge of day-night image pairs to improve the quality of nighttime image relighting.

Autonomous Driving Image Relighting +3

Paper
Code

Global and Local Alignment Networks for Unpaired Image-to-Image Translation

1 code implementation • 19 Nov 2021 • Guanglei Yang, Hao Tang, Humphrey Shi, Mingli Ding, Nicu Sebe, Radu Timofte, Luc van Gool, Elisa Ricci

The global alignment network aims to transfer the input image from the source domain to the target domain.

Image-to-Image Translation Translation

Paper
Code

Highly Efficient Natural Image Matting

no code implementations • 25 Oct 2021 • Yijie Zhong, Bo Li, Lv Tang, Hao Tang, Shouhong Ding

With a lightweight basic convolution block, we build a two-stages framework: Segmentation Network (SN) is designed to capture sufficient semantics and classify the pixels into unknown, foreground and background regions; Matting Refine Network (MRN) aims at capturing detailed texture information and regressing accurate alpha values.

Image Matting

Paper
Add Code

AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation

no code implementations • 20 Oct 2021 • Xiangyi Yan, Hao Tang, Shanlin Sun, Haoyu Ma, Deying Kong, Xiaohui Xie

One has to either downsample the image or use cropped local patches to reduce GPU memory usage, which limits its performance.

Image Segmentation Medical Image Segmentation +3

Paper
Add Code

AniFormer: Data-driven 3D Animation with Transformer

1 code implementation • 20 Oct 2021 • Haoyu Chen, Hao Tang, Nicu Sebe, Guoying Zhao

Instead, we introduce AniFormer, a novel Transformer-based architecture, that generates animated 3D sequences by directly taking the raw driving sequences and arbitrary same-type target meshes as inputs.

regression

Paper
Code

Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation

1 code implementation • 19 Oct 2021 • Bin Ren, Hao Tang, Nicu Sebe

To ease this problem, we propose a novel two-stage framework with a new Cascaded Cross MLP-Mixer (CrossMLP) sub-network in the first stage and one refined pixel-level loss in the second stage.

Translation

Paper
Code

TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation

1 code implementation • 18 Oct 2021 • Haoyu Ma, Liangjian Chen, Deying Kong, Zhe Wang, Xingwei Liu, Hao Tang, Xiangyi Yan, Yusheng Xie, Shih-Yao Lin, Xiaohui Xie

The 3D position encoding guided by the epipolar field provides an efficient way of encoding correspondences between pixels of different views.

Ranked #20 on 3D Human Pose Estimation on Human3.6M (using extra training data)

3D Human Pose Estimation 3D Pose Estimation

Paper
Code

Stingy Teacher: Sparse Logits Suffice to Fail Knowledge Distillation

no code implementations • 29 Sep 2021 • Haoyu Ma, Yifan Huang, Tianlong Chen, Hao Tang, Chenyu You, Zhangyang Wang, Xiaohui Xie

However, it is unclear why the distorted distribution of the logits is catastrophic to the student model.

Knowledge Distillation

Paper
Add Code

Equivariant Neural Network for Factor Graphs

no code implementations • 29 Sep 2021 • Fan-Yun Sun, Jonathan Kuck, Hao Tang, Stefano Ermon

Several indices used in a factor graph data structure can be permuted without changing the underlying probability distribution.

Inductive Bias

Paper
Add Code

HFSP: A Hardware-friendly Soft Pruning Framework for Vision Transformers

no code implementations • 29 Sep 2021 • Zhenglun Kong, Peiyan Dong, Xiaolong Ma, Xin Meng, Mengshu Sun, Wei Niu, Bin Ren, Minghai Qin, Hao Tang, Yanzhi Wang

Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult.

Image Classification Model Compression

Paper
Add Code

On the Difficulty of Segmenting Words with Attention

no code implementations • EMNLP (insights) 2021 • Ramon Sanabria, Hao Tang, Sharon Goldwater

Word segmentation, the problem of finding word boundaries in speech, is of interest for a range of tasks.

Segmentation speech-recognition +2

Paper
Add Code

Multi-Sample based Contrastive Loss for Top-k Recommendation

1 code implementation • 1 Sep 2021 • Hao Tang, Guoshuai Zhao, Yuxia Wu, Xueming Qian

Therefore, we propose a Multi-Sample based Contrastive Loss (MSCL) function which solves the two problems by balancing the importance of positive and negative samples and data augmentation.

Contrastive Learning Data Augmentation +1

Paper
Code

Layout-to-Image Translation with Double Pooling Generative Adversarial Networks

1 code implementation • 29 Aug 2021 • Hao Tang, Nicu Sebe

In this paper, we address the task of layout-to-image translation, which aims to translate an input semantic layout to a realistic image.

Translation

Paper
Code

Intrinsic-Extrinsic Preserved GANs for Unsupervised 3D Pose Transfer

1 code implementation • ICCV 2021 • Haoyu Chen, Hao Tang, Henglin Shi, Wei Peng, Nicu Sebe, Guoying Zhao

With the strength of deep generative models, 3D pose transfer regains intensive research interests in recent years.

Disentanglement Generative Adversarial Network +1

Paper
Code

Recurrent Mask Refinement for Few-Shot Medical Image Segmentation

1 code implementation • ICCV 2021 • Hao Tang, Xingwei Liu, Shanlin Sun, Xiangyi Yan, Xiaohui Xie

Although having achieved great success in medical image segmentation, deep convolutional neural networks usually require a large dataset with manual annotations for training and are difficult to generalize to unseen classes.

Few-Shot Learning Image Segmentation +4

Paper
Code

Cross-View Exocentric to Egocentric Video Synthesis

no code implementations • 7 Jul 2021 • Gaowen Liu, Hao Tang, Hugo Latapie, Jason Corso, Yan Yan

Particularly, we propose a novel Bi-directional Spatial Temporal Attention Fusion Generative Adversarial Network (STA-GAN) to learn both spatial and temporal information to generate egocentric video sequences from the exocentric view.

Generative Adversarial Network Video Generation

Paper
Add Code

Looking Outside the Window: Wide-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images

1 code implementation • 29 Jun 2021 • Lei Ding, Dong Lin, Shaofu Lin, Jing Zhang, Xiaojie Cui, Yuebin Wang, Hao Tang, Lorenzo Bruzzone

To overcome this limitation, we propose a Wide-Context Network (WiCoNet) for the semantic segmentation of HR RSIs.

Image Cropping Semantic Segmentation

Paper
Code

Total Generate: Cycle in Cycle Generative Adversarial Networks for Generating Human Faces, Hands, Bodies, and Natural Scenes

1 code implementation • 21 Jun 2021 • Hao Tang, Nicu Sebe

Both generators are mutually connected and trained in an end-to-end fashion and explicitly form three cycled subnets, i. e., one image generation cycle and two guidance generation cycles.

Generative Adversarial Network Image-to-Image Translation +1

Paper
Code

Controllable Person Image Synthesis with Spatially-Adaptive Warped Normalization

1 code implementation • 31 May 2021 • Jichao Zhang, Aliaksandr Siarohin, Hao Tang, Enver Sangineto, Wei Wang, Humphrey Sh, Nicu Sebe

Moreover, we propose a novel Self-Training Part Replacement (STPR) strategy to refine the model for the texture-transfer task, which improves the quality of the generated clothes and the preservation ability of non-target regions.

Image-to-Image Translation Pose Transfer +1

Paper
Code

Transformer-Based Source-Free Domain Adaptation

1 code implementation • 28 May 2021 • Guanglei Yang, Hao Tang, Zhun Zhong, Mingli Ding, Ling Shao, Nicu Sebe, Elisa Ricci

In this paper, we study the task of source-free domain adaptation (SFDA), where the source data are not available during target adaptation.

Knowledge Distillation Source-Free Domain Adaptation

Paper
Code

Cloth Interactive Transformer for Virtual Try-On

1 code implementation • 12 Apr 2021 • Bin Ren, Hao Tang, Fanyang Meng, Runwei Ding, Philip H. S. Torr, Nicu Sebe

In the second stage, we put forth a CIT reasoning block for establishing global mutual interactive dependencies among person representation, the warped clothing item, and the corresponding warped cloth mask.

Virtual Try-on

Paper
Code

Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction

1 code implementation • ICCV 2021 • Guanglei Yang, Hao Tang, Mingli Ding, Nicu Sebe, Elisa Ricci

While convolutional neural networks have shown a tremendous impact on various computer vision tasks, they generally demonstrate limitations in explicitly modeling long-range dependencies due to the intrinsic locality of the convolution operation.

Ranked #8 on Depth Estimation on NYU-Depth V2

Depth Estimation Depth Prediction +1

168

Paper
Code

Adversarial Shape Learning for Building Extraction in VHR Remote Sensing Images

1 code implementation • 22 Feb 2021 • Lei Ding, Hao Tang, Yahui Liu, Yilei Shi, Xiao Xiang Zhu, Lorenzo Bruzzone

To address this issue, we propose an adversarial shape learning network (ASLNet) to model the building shape patterns that improve the accuracy of building segmentation.

Segmentation

Paper
Code

Semantically-Adaptive Upsampling for Layout-to-Image Translation

no code implementations • 1 Jan 2021 • Hao Tang, Nicu Sebe

We propose the Semantically-Adaptive UpSampling (SA-UpSample), a general and highly effective upsampling method for the layout-to-image translation task.

Translation

Paper
Add Code

Spatial Context-Aware Self-Attention Model For Multi-Organ Segmentation

no code implementations • 16 Dec 2020 • Hao Tang, Xingwei Liu, Kun Han, Shanlin Sun, Narisu Bai, Xuming Chen, Huang Qian, Yong liu, Xiaohui Xie

State-of-the-art CNN segmentation models apply either 2D or 3D convolutions on input images, with pros and cons associated with each method: 2D convolution is fast, less memory-intensive but inadequate for extracting 3D contextual information from volumetric images, while the opposite is true for 3D convolution.

Image Segmentation Organ Segmentation +2

Paper
Add Code

Towards Scale-Invariant Graph-related Problem Solving by Iterative Homogeneous GNNs

no code implementations • NeurIPS 2020 • Hao Tang, Zhiao Huang, Jiayuan Gu, Bao-liang Lu, Hao Su

Current graph neural networks (GNNs) lack generalizability with respect to scales (graph sizes, graph diameters, edge weights, etc..) when solving many graph analysis problems.

Paper
Add Code

Towards Scale-Invariant Graph-related Problem Solving by Iterative Homogeneous Graph Neural Networks

1 code implementation • 26 Oct 2020 • Hao Tang, Zhiao Huang, Jiayuan Gu, Bao-liang Lu, Hao Su

Current graph neural networks (GNNs) lack generalizability with respect to scales (graph sizes, graph diameters, edge weights, etc..) when solving many graph analysis problems.

Paper
Code

Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals

no code implementations • NeurIPS 2020 • Tongzhou Mu, Jiayuan Gu, Zhiwei Jia, Hao Tang, Hao Su

We study how to learn a policy with compositional generalizability.

Inductive Bias Self-Supervised Learning

Paper
Add Code

Dual Attention GANs for Semantic Image Synthesis

1 code implementation • 29 Aug 2020 • Hao Tang, Song Bai, Nicu Sebe

We also propose two novel modules, i. e., position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention Module (CAM), to capture semantic structure attention in spatial and channel dimensions, respectively.

Image Generation Position

110

Paper
Code

Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention

no code implementations • 14 Aug 2020 • Bin Duan, Hao Tang, Wei Wang, Ziliang Zong, Guowei Yang, Yan Yan

Recent works have shown that attention mechanism is beneficial to the fusion process.

audio-visual event localization valid

Paper
Add Code

DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis

3 code implementations • CVPR 2022 • Ming Tao, Hao Tang, Fei Wu, Xiao-Yuan Jing, Bing-Kun Bao, Changsheng Xu

To these ends, we propose a simpler but more effective Deep Fusion Generative Adversarial Networks (DF-GAN).

Ranked #4 on Text-to-Image Generation on CUB (Inception score metric)

Text Matching Text-to-Image Generation

283

Paper
Code

Bipartite Graph Reasoning GANs for Person Image Generation

1 code implementation • 10 Aug 2020 • Hao Tang, Song Bai, Philip H. S. Torr, Nicu Sebe

We present a novel Bipartite Graph Reasoning GAN (BiGraphGAN) for the challenging person image generation task.

Ranked #1 on Pose Transfer on Market-1501 (PCKh metric)

Pose Transfer

128

Paper
Code

Dual In-painting Model for Unsupervised Gaze Correction and Animation in the Wild

1 code implementation • 9 Aug 2020 • Jichao Zhang, Jingjing Chen, Hao Tang, Wei Wang, Yan Yan, Enver Sangineto, Nicu Sebe

In this paper we address the problem of unsupervised gaze correction in the wild, presenting a solution that works without the need for precise annotations of the gaze angle and the head pose.

146

Paper
Code

Quantum Computation for Pricing the Collateralized Debt Obligations

no code implementations • 6 Aug 2020 • Hao Tang, Anurag Pal, Lu-Feng Qiao, Tian-Yu Wang, Jun Gao, Xian-Min Jin

Collateralized debt obligation (CDO) has been one of the most commonly used structured financial products and is intensively studied in quantitative finance.

Paper
Add Code

Cross-View Image Synthesis with Deformable Convolution and Attention Mechanism

no code implementations • 20 Jul 2020 • Hao Ding, Songsong Wu, Hao Tang, Fei Wu, Guangwei Gao, Xiao-Yuan Jing

This is even more laborious when generating images with very different views.

Image Generation

Paper
Add Code

XingGAN for Person Image Generation

2 code implementations • ECCV 2020 • Hao Tang, Song Bai, Li Zhang, Philip H. S. Torr, Nicu Sebe

We propose a novel Generative Adversarial Network (XingGAN or CrossingGAN) for person image generation tasks, i. e., translating the pose of a given person to a desired one.

Ranked #1 on Pose Transfer on Market-1501 (IS metric)

Generative Adversarial Network Pose Transfer

229

Paper
Code

AMR Parsing with Latent Structural Information

no code implementations • ACL 2020 • Qiji Zhou, Yue Zhang, Donghong Ji, Hao Tang

Abstract Meaning Representations (AMRs) capture sentence-level semantics structural representations to broad-coverage natural sentences.

AMR Parsing Sentence

Paper
Add Code

Dependency Graph Enhanced Dual-transformer Structure for Aspect-based Sentiment Classification

no code implementations • ACL 2020 • Hao Tang, Donghong Ji, Chenliang Li, Qiji Zhou

The idea is to allow the dependency graph to guide the representation learning of the transformer encoder and vice versa.

General Classification Representation Learning +3

Paper
Add Code

Belief Propagation Neural Networks

1 code implementation • NeurIPS 2020 • Jonathan Kuck, Shuvam Chakraborty, Hao Tang, Rachel Luo, Jiaming Song, Ashish Sabharwal, Stefano Ermon

Learned neural solvers have successfully been used to solve combinatorial optimization and decision problems.

Combinatorial Optimization

Paper
Code

When Dictionary Learning Meets Deep Learning: Deep Dictionary Learning and Coding Network for Image Recognition with Limited Data

1 code implementation • 21 May 2020 • Hao Tang, Hong Liu, Wei Xiao, Nicu Sebe

Then the activated dictionary atoms are assembled and passed to the compound dictionary learning and coding layers.

Dictionary Learning

Paper
Code

Relevant Region Prediction for Crowd Counting

no code implementations • 20 May 2020 • Xinya Chen, Yanrui Bin, Changxin Gao, Nong Sang, Hao Tang

The module builds a fully connected directed graph between the regions of different density where each node (region) is represented by weighted global pooled feature, and GCN is learned to map this region graph to a set of relation-aware regions representations.

Crowd Counting Relation

Paper
Add Code

Vector-Quantized Autoregressive Predictive Coding

2 code implementations • 17 May 2020 • Yu-An Chung, Hao Tang, James Glass

Autoregressive Predictive Coding (APC), as a self-supervised objective, has enjoyed success in learning representations from large amounts of unlabeled data, and the learned representations are rich for many downstream tasks.

Paper
Code

Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis

2 code implementations • 31 Mar 2020 • Hao Tang, Xiaojuan Qi, Guolei Sun, Dan Xu, Nicu Sebe, Radu Timofte, Luc van Gool

We propose a novel ECGAN for the challenging semantic image synthesis task.

Contrastive Learning Image Generation

Paper
Code

Exocentric to Egocentric Image Generation via Parallel Generative Adversarial Network

no code implementations • 8 Feb 2020 • Gaowen Liu, Hao Tang, Hugo Latapie, Yan Yan

In this paper, we investigate exocentric (third-person) view to egocentric (first-person) view image generation.

Generative Adversarial Network Image Generation

Paper
Add Code

Multi-Channel Attention Selection GANs for Guided Image-to-Image Translation

1 code implementation • 3 Feb 2020 • Hao Tang, Philip H. S. Torr, Nicu Sebe

In the first stage, the input image and the conditional semantic guidance are fed into a cycled semantic-guided generation network to produce initial coarse results.

Generative Adversarial Network Image-to-Image Translation +1

458

Paper
Code

AttentionAnatomy: A unified framework for whole-body organs at risk segmentation using multiple partially annotated datasets

no code implementations • 13 Jan 2020 • Shanlin Sun, Yang Liu, Narisu Bai, Hao Tang, Xuming Chen, Qian Huang, Yong liu, Xiaohui Xie

Organs-at-risk (OAR) delineation in computed tomography (CT) is an important step in Radiation Therapy (RT) planning.

Computed Tomography (CT)

Paper
Add Code

Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

2 code implementations • CVPR 2020 • Hao Tang, Dan Xu, Yan Yan, Philip H. S. Torr, Nicu Sebe

To tackle this issue, in this work we consider learning the scene generation in a local context, and correspondingly design a local class-specific generative network with semantic maps as a guidance, which separately constructs and learns sub-generators concentrating on the generation of different classes, and is able to provide more scene details.

Ranked #2 on Cross-View Image-to-Image Translation on Dayton (256×256) - aerial-to-ground

Image Generation Scene Generation

145

Paper
Code

Asymmetric Generative Adversarial Networks for Image-to-Image Translation

1 code implementation • 14 Dec 2019 • Hao Tang, Dan Xu, Hong Liu, Nicu Sebe

In this paper, we analyze the limitation of the existing symmetric GAN models in asymmetric translation tasks, and propose an AsymmetricGAN model with both translation and reconstruction generators of unequal sizes and different parameter-sharing strategy to adapt to the asymmetric need in both unsupervised and supervised image-to-image translation tasks.

Image-to-Image Translation Translation

Paper
Code

Unified Generative Adversarial Networks for Controllable Image-to-Image Translation

1 code implementation • 12 Dec 2019 • Hao Tang, Hong Liu, Nicu Sebe

The proposed model consists of a single generator and a discriminator taking a conditional image and the target controllable structure as input.

Ranked #1 on Cross-View Image-to-Image Translation on cvusa

Facial Expression Translation Generative Adversarial Network +3

173

Paper
Code

AttentionGAN: Unpaired Image-to-Image Translation using Attention-Guided Generative Adversarial Networks

2 code implementations • 27 Nov 2019 • Hao Tang, Hong Liu, Dan Xu, Philip H. S. Torr, Nicu Sebe

State-of-the-art methods in image-to-image translation are capable of learning a mapping from a source domain to a target domain with unpaired image data.

Ranked #1 on Facial Expression Translation on CelebA

Image-to-Image Translation Translation

622

Paper
Code

Improving Semantic Segmentation of Aerial Images Using Patch-based Attention

no code implementations • 20 Nov 2019 • Lei Ding, Hao Tang, Lorenzo Bruzzone

High-level features extracted from the late layers of a neural network are rich in semantic information, yet have blurred spatial details; low-level features extracted from the early layers of a network contain more pixel-level information, but are isolated and noisy.

Semantic Segmentation

Paper
Add Code

Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation

1 code implementation • 2 Aug 2019 • Hao Tang, Dan Xu, Gaowen Liu, Wei Wang, Nicu Sebe, Yan Yan

In this work, we propose a novel Cycle In Cycle Generative Adversarial Network (C$^2$GAN) for the task of keypoint-guided image generation.

Generative Adversarial Network Image Generation

Paper
Code

NoduleNet: Decoupled False Positive Reductionfor Pulmonary Nodule Detection and Segmentation

1 code implementation • 25 Jul 2019 • Hao Tang, Chupeng Zhang, Xiaohui Xie

Pulmonary nodule detection, false positive reduction and segmentation represent three of the most common tasks in the computeraided analysis of chest CT images.

Friction Segmentation

180

Paper
Code

Cascade Attention Guided Residue Learning GAN for Cross-Modal Translation

1 code implementation • 3 Jul 2019 • Bin Duan, Wei Wang, Hao Tang, Hugo Latapie, Yan Yan

However, in machine learning, this cross-modal learning is a nontrivial task because different modalities have no homogeneous properties.

BIG-bench Machine Learning Translation

Paper
Code

Relating Simple Sentence Representations in Deep Neural Networks and the Brain

1 code implementation • ACL 2019 • Sharmistha Jat, Hao Tang, Partha Talukdar, Tom Mitchell

To the best of our knowledge, this is the first work showing that the MEG brain recording when reading a word in a sentence can be used to distinguish earlier words in the sentence.

Sentence

Paper
Code

GazeCorrection:Self-Guided Eye Manipulation in the wild using Self-Supervised Generative Adversarial Networks

no code implementations • arXiv 2019 • Jichao Zhang, Meng Sun, Jingjing Chen, Hao Tang, Yan Yan, Xueying Qin, Nicu Sebe

Gaze correction aims to redirect the person's gaze into the camera by manipulating the eye region, and it can be considered as a specific image resynthesis problem.

Resynthesis

Paper
Add Code

Expression Conditional GAN for Facial Expression-to-Expression Translation

no code implementations • 14 May 2019 • Hao Tang, Wei Wang, Songsong Wu, Xinya Chen, Dan Xu, Nicu Sebe, Yan Yan

In this paper, we focus on the facial expression translation task and propose a novel Expression Conditional GAN (ECGAN) which can learn the mapping from one image domain to another one based on an additional expression attribute.

Attribute Facial expression generation +2

Paper
Add Code

Structured Discriminative Tensor Dictionary Learning for Unsupervised Domain Adaptation

no code implementations • 11 May 2019 • Songsong Wu, Yan Yan, Hao Tang, Jianjun Qian, Jian Zhang, Xiao-Yuan Jing

However, the number of labeled source samples are always limited due to expensive annotation cost in practice, making sub-optimal performance been observed.

Dictionary Learning Pseudo Label +1

Paper
Add Code

Joint Learning of Self-Representation and Indicator for Multi-View Image Clustering

no code implementations • 11 May 2019 • Songsong Wu, Zhiqiang Lu, Hao Tang, Yan Yan, Songhao Zhu, Xiao-Yuan Jing, Zuoyong Li

Multi-view subspace clustering aims to divide a set of multisource data into several groups according to their underlying subspace structure.

Clustering Multi-view Subspace Clustering

Paper
Add Code

Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification

no code implementations • 11 May 2019 • Achintya kr. Sarkar, Zheng-Hua Tan, Hao Tang, Suwon Shon, James Glass

There are a number of studies about extraction of bottleneck (BN) features from deep neural networks (DNNs)trained to discriminate speakers, pass-phrases and triphone states for improving the performance of text-dependent speaker verification (TD-SV).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation

3 code implementations • CVPR 2019 • Hao Tang, Dan Xu, Nicu Sebe, Yanzhi Wang, Jason J. Corso, Yan Yan

In this paper, we propose a novel approach named Multi-Channel Attention SelectionGAN (SelectionGAN) that makes it possible to generate images of natural scenes in arbitrary viewpoints, based on an image of the scene and a novel semantic map.

Ranked #1 on Cross-View Image-to-Image Translation on Dayton (64×64) - aerial-to-ground

Bird View Synthesis Cross-View Image-to-Image Translation +1

458

Paper
Code

VoiceID Loss: Speech Enhancement for Speaker Verification

no code implementations • 7 Apr 2019 • Suwon Shon, Hao Tang, James Glass

In this paper, we propose VoiceID loss, a novel loss function for training a speech enhancement model to improve the robustness of speaker verification.

Speaker Verification Speech Enhancement

Paper
Add Code

An Unsupervised Autoregressive Model for Speech Representation Learning

5 code implementations • 5 Apr 2019 • Yu-An Chung, Wei-Ning Hsu, Hao Tang, James Glass

This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations.

General Classification Representation Learning +1

184

Paper
Code

Attention-Guided Generative Adversarial Networks for Unsupervised Image-to-Image Translation

8 code implementations • 28 Mar 2019 • Hao Tang, Dan Xu, Nicu Sebe, Yan Yan

To handle the limitation, in this paper we propose a novel Attention-Guided Generative Adversarial Network (AGGAN), which can detect the most discriminative semantic object and minimize changes of unwanted part for semantic manipulation problems without using extra data and models.

Ranked #1 on Facial Expression Translation on CelebA

Generative Adversarial Network Translation +1

622

Paper
Code

An End-to-end Framework For Integrated Pulmonary Nodule Detection and False Positive Reduction

no code implementations • 23 Mar 2019 • Hao Tang, Xingwei Liu, Xiaohui Xie

Most of the existing deep learning nodule detection systems are constructed in two steps: a) nodule candidates screening and b) false positive reduction, using two different models trained separately.

Computed Tomography (CT)

Paper
Add Code

Automatic Pulmonary Lobe Segmentation Using Deep Learning

1 code implementation • 23 Mar 2019 • Hao Tang, Chupeng Zhang, Xiaohui Xie

To validate the robustness and performance of our proposed framework trained with a small number of training examples, we further tested our model on CT scans from an independent dataset.

Computed Tomography (CT)

Paper
Code

Automated pulmonary nodule detection using 3D deep convolutional neural networks

no code implementations • 23 Mar 2019 • Hao Tang, Daniel R. Kim, Xiaohui Xie

Finally, we introduce a method to ensemble models from both stages via consensus to give the final predictions.

Computed Tomography (CT) object-detection +1

Paper
Add Code

Improving Dense Crowd Counting Convolutional Neural Networks using Inverse k-Nearest Neighbor Maps and Multiscale Upsampling

1 code implementation • 31 Jan 2019 • Greg Olmschenk, Hao Tang, Zhigang Zhu

Gatherings of thousands to millions of people frequently occur for an enormous variety of events, and automated counting of these high-density crowds is useful for safety, management, and measuring significance of an event.

Crowd Counting Management

Paper
Code

Attribute-Guided Sketch Generation

1 code implementation • 28 Jan 2019 • Hao Tang, Xinya Chen, Wei Wang, Dan Xu, Jason J. Corso, Nicu Sebe, Yan Yan

To this end, we propose a novel Attribute-Guided Sketch Generative Adversarial Network (ASGAN) which is an end-to-end framework and contains two pairs of generators and discriminators, one of which is used to generate faces with attributes while the other one is employed for image-to-sketch translation.

Attribute Generative Adversarial Network +1

Paper
Code

Fast and Robust Dynamic Hand Gesture Recognition via Key Frames Extraction and Feature Fusion

1 code implementation • 15 Jan 2019 • Hao Tang, Hong Liu, Wei Xiao, Nicu Sebe

Gesture recognition is a hot topic in computer vision and pattern recognition, which plays a vitally important role in natural human-computer interface.

Ranked #1 on Hand Gesture Recognition on Cambridge

Clustering Hand Gesture Recognition +1

Paper
Code

Dual Generator Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

1 code implementation • 14 Jan 2019 • Hao Tang, Dan Xu, Wei Wang, Yan Yan, Nicu Sebe

State-of-the-art methods for image-to-image translation with Generative Adversarial Networks (GANs) can learn a mapping from one domain to another domain using unpaired image data.

Generative Adversarial Network Image-to-Image Translation +1

Paper
Code

Generalizing semi-supervised generative adversarial networks to regression using feature contrasting

no code implementations • 27 Nov 2018 • Greg Olmschenk, Zhigang Zhu, Hao Tang

We first demonstrate the capabilities of semi-supervised regression GANs on a toy dataset which allows for a detailed understanding of how they operate in various circumstances.

Age Estimation Crowd Counting +2

Paper
Add Code

On The Inductive Bias of Words in Acoustics-to-Word Models

no code implementations • 31 Oct 2018 • Hao Tang, James Glass

In addition, we study three types of inductive bias, leveraging a pronunciation dictionary, word boundary annotations, and constraints on word durations.

Inductive Bias

Paper
Add Code

Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model

1 code implementation • 12 Sep 2018 • Suwon Shon, Hao Tang, James Glass

In this paper, we propose a Convolutional Neural Network (CNN) based speaker recognition model for extracting robust speaker embeddings.

Speaker Recognition Text-Independent Speaker Recognition

Paper
Code

Deep Micro-Dictionary Learning and Coding Network

1 code implementation • 11 Sep 2018 • Hao Tang, Heng Wei, Wei Xiao, Wei Wang, Dan Xu, Yan Yan, Nicu Sebe

In this paper, we propose a novel Deep Micro-Dictionary Learning and Coding Network (DDLCN).

Dictionary Learning

Paper
Code

Integrated Server for Measurement-Device-Independent Quantum Key Distribution Network

no code implementations • 26 Aug 2018 • Ci-Yu Wang, Jun Gao, Zhi-Qiang Jiao, Lu-Feng Qiao, Ruo-Jing Ren, Zhen Feng, Yuan Chen, Zeng-Quan Yan, Yao Wang, Hao Tang, Xian-Min Jin

Quantum key distribution (QKD), harnessing quantum physics and optoelectronics, may promise unconditionally secure information exchange in theory.

Quantum Physics

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.