1 code implementation • 8 Jul 2025 • Chihan Huang, Hao Tang
In this paper, we introduce a novel approach for generating UAEs based on diffusion models, named ScoreAdv.
no code implementations • 19 Jun 2025 • Xin Jiang, Meiqi Cao, Hao Tang, Fei Shen, Zechao Li
Fine-Grained Image Retrieval~(FGIR) faces challenges in learning discriminative visual representations to retrieve images with similar fine-grained features.
no code implementations • 10 Jun 2025 • Jianing Qi, Xi Ye, Hao Tang, Zhigang Zhu, Eunsol Choi
By separating LLMs to generate answers and LLMs to analyze and aggregate sampled answers, our approach can work with the outputs from premier black box models easily and efficiently.
1 code implementation • 8 Jun 2025 • Hao Tang, Chengchao Shen
Specifically, we propose a Spatial Token Fusion (STF) method to learn compact vision tokens for short vision token sequence, where spatial-adjacent tokens are fused into one.
no code implementations • 6 Jun 2025 • Fanhu Zeng, Deli Yu, Zhenglun Kong, Hao Tang
In this paper, we rethink token reduction and unify the process as an explicit form of token matrix transformation, in which all existing methods are constructing special forms of matrices within the framework.
no code implementations • 2 Jun 2025 • Kaixun Jiang, Zhaoyu Chen, Haijing Guo, Jinglun Li, Jiyuan Fu, Pinxue Guo, Hao Tang, Bo Li, Wenqiang Zhang
Unlike benign alignment, adversarial alignment involves two inherently conflicting preferences: visual consistency and attack effectiveness, which often lead to unstable optimization and reward hacking (e. g., reducing visual quality to improve attack success).
no code implementations • 29 May 2025 • Xiaoyi Liu, Hao Tang
We propose FOLIAGE, a physics-informed multimodal world model for unbounded accretive surface growth.
1 code implementation • 28 May 2025 • Zhenglun Kong, Zheng Zhan, Shiyue Hou, Yifan Gong, Xin Meng, Pengwei Sui, Peiyan Dong, Xuan Shen, Zifeng Wang, Pu Zhao, Hao Tang, Stratis Ioannidis, Yanzhi Wang
To address these issues, we propose a framework that adaptively selects and aggregates knowledge from diverse LLMs to build a single, stronger model, avoiding the high memory overhead of ensemble and inflexible weight merging.
no code implementations • 28 May 2025 • Yen Meng, Sharon Goldwater, Hao Tang
Modern neural speech models benefit from having longer context, and many approaches have been proposed to increase the maximum context a model can use.
no code implementations • 26 May 2025 • Zhuoheng Gao, Yihao Li, Jiyao Zhang, Rui Zhao, Tong Wu, Hao Tang, Zhaofei Yu, Hao Dong, Guozhang Chen, Tiejun Huang
To address this gap, we propose SpikeStereoNet, a brain-inspired framework and the first to estimate stereo depth directly from raw spike streams.
1 code implementation • 23 May 2025 • Zhenglun Kong, Yize Li, Fanhu Zeng, Lei Xin, Shvat Messica, Xue Lin, Pu Zhao, Manolis Kellis, Hao Tang, Marinka Zitnik
We highlight its potential to drive new model architectures and learning strategies that improve robustness, increase interpretability, and better align with the objectives of generative modeling.
no code implementations • 22 May 2025 • Guohao Huo, Ruiting Dai, Hao Tang
To address the challenge of complex pathological feature extraction in automated cardiac MRI segmentation, this study proposes an innovative dual-encoder architecture named SAMba-UNet.
no code implementations • 22 May 2025 • Runsen Xu, Weiyao Wang, Hao Tang, Xingyu Chen, Xiaodong Wang, Fu-Jen Chu, Dahua Lin, Matt Feiszli, Kevin J. Liang
Multi-modal large language models (MLLMs) have rapidly advanced in visual tasks, yet their spatial understanding remains limited to single images, leaving them ill-suited for robotics and other real-world applications that require multi-frame reasoning.
1 code implementation • 20 May 2025 • Hao Tang, Kevin Ellis, Suhas Lohit, Michael J. Jones, Moitreya Chatterjee
The task of estimating the world model describing the dynamics of a real world process assumes immense importance for anticipating and preparing for future outcomes.
no code implementations • 20 May 2025 • Sifan Li, Ming Tao, Hao Zhao, Ling Shao, Hao Tang
For those scenes that are impossible to happen in real world and anti-physics, we should spare no efforts in increasing the factual feel, which means synthesizing images that people think very likely to be happening, and concept alignment, which means all the required objects should be in the same frame.
no code implementations • 20 May 2025 • Chihan Huang, Hao Tang
Although autoregressive models have dominated language modeling in recent years, there has been a growing interest in exploring alternative paradigms to the conventional next-token prediction framework.
no code implementations • 20 May 2025 • Jun Liu, Zhenglun Kong, Peiyan Dong, Changdi Yang, Tianqi Li, Hao Tang, Geng Yuan, Wei Niu, Wenbin Zhang, Pu Zhao, Xue Lin, Dong Huang, Yanzhi Wang
Large language models (LLMs) exhibit strong capabilities as decision-making agents by interleaving reasoning and actions, as seen in ReAct-style frameworks.
1 code implementation • 16 May 2025 • Wasu Top Piriyakulkij, Yichao Liang, Hao Tang, Adrian Weller, Marta Kryven, Kevin Ellis
Learning how the world works is central to building AI agents that can adapt to complex environments.
no code implementations • 13 May 2025 • Keyu Chen, Hao Tang, Qinglin Liu, Yizhao Xu
Language model alignment is crucial for ensuring that large language models (LLMs) align with human preferences, yet it often involves sensitive user data, raising significant privacy concerns.
1 code implementation • 11 May 2025 • Zihang Liu, Zhenyu Zhang, Hao Tang
To address this limitation, we propose SAMSR, a semantic-guided diffusion framework that incorporates semantic segmentation masks into the sampling process.
no code implementations • 4 May 2025 • Aidan Curtis, Hao Tang, Thiago Veloso, Kevin Ellis, Joshua Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling
Partially Observable Markov Decision Processes (POMDPs) model decision making under uncertainty.
no code implementations • 29 Apr 2025 • Jiarui Ye, Hao Tang
At the end of the survey, we discuss the challenges faced by MLLMs in the medical and healthcare domain and propose feasible methods to mitigate or overcome these issues.
no code implementations • 24 Apr 2025 • Xu Ma, Peize Sun, Haoyu Ma, Hao Tang, Chih-Yao Ma, Jialiang Wang, Kunpeng Li, Xiaoliang Dai, Yujun Shi, Xuan Ju, Yushi Hu, Artsiom Sanakoyeu, Felix Juefei-Xu, Ji Hou, Junjiao Tian, Tao Xu, Tingbo Hou, Yen-Cheng Liu, Zecheng He, Zijian He, Matt Feiszli, Peizhao Zhang, Peter Vajda, Sam Tsai, Yun Fu
A primary limitation is the substantial number of image tokens required for AR models, which constrains both training and inference efficiency, as well as image resolution.
no code implementations • 17 Apr 2025 • Yihua Shao, Haojin He, Sijie Li, Siyu Chen, Xinwei Long, Fanhu Zeng, Yuxuan Fan, Muyang Zhang, Ziyang Yan, Ao Ma, Xiaochen Wang, Hao Tang, Yan Wang, Shuyan Li
Therefore, we propose EventVAD, an event-aware video anomaly detection framework that combines tailored dynamic graph architectures and multimodal LLMs through temporal-event reasoning.
1 code implementation • 13 Apr 2025 • Ting Huang, Zeyu Zhang, Yemin Wang, Hao Tang
3D captioning, which aims to describe the content of 3D scenes in natural language, remains highly challenging due to the inherent sparsity of point clouds and weak cross-modal alignment in existing methods.
Ranked #1 on
3D dense captioning
on Nr3D
no code implementations • AAAI 2025 • Zhicheng Zhang, Hao Tang, Jinhui Tang
Specifically, we first propose a multi-scale cue activation module to ensure the discriminative cues learned at different stage are mutually different.
Ranked #3 on
Fine-Grained Image Classification
on CUB-200-2011
no code implementations • 28 Mar 2025 • Haijie Yang, Zhenyu Zhang, Hao Tang, Jianjun Qian, Jian Yang
However, they often face challenges with temporal consistency, particularly in the talking head domain, where continuous changes in facial expressions intensify the level of difficulty.
no code implementations • CVPR 2025 • Mingju Gao, Yike Pan, Huan-ang Gao, Zongzheng Zhang, Wenyi Li, Hao Dong, Hao Tang, Li Yi, Hao Zhao
As interest grows in world models that predict future states from current observations and actions, accurately modeling part-level dynamics has become increasingly relevant for various applications.
no code implementations • 21 Mar 2025 • Jianing Qi, Jiawei Liu, Hao Tang, Zhigang Zhu
Vision-Language Models (VLMs) excel at identifying and describing objects but struggle with spatial reasoning such as accurately understanding the relative positions of objects.
1 code implementation • CVPR 2025 • Fanhu Zeng, Hao Tang, Yihua Shao, Siyu Chen, Ling Shao, Yan Wang
Inspired by the effectiveness of state space models (SSMs) in capturing long-range dependencies, we leverage SSMs to address computational inefficiency in existing methods and improve image compression from multiple perspectives.
no code implementations • 11 Mar 2025 • Jiaxuan Zhu, Hao Tang
Representing and rendering dynamic scenes from 2D images is a fundamental yet challenging problem in computer vision and graphics.
no code implementations • 9 Mar 2025 • Yu Liu, Hao Tang, Haiqi Zhang, Jing Qin, Zechao Li
Out-of-distribution (OOD) detection is crucial for ensuring the reliability and safety of machine learning models in real-world applications.
Out-of-Distribution Detection
Out of Distribution (OOD) Detection
no code implementations • 9 Mar 2025 • Yihua Shao, Deyang Lin, Fanhu Zeng, Minxi Yan, Muyang Zhang, Siyu Chen, Yuxuan Fan, Ziyang Yan, Haozhe Wang, Jingcai Guo, Yan Wang, Haotong Qin, Hao Tang
TR-DQ achieves state-of-the-art (SOTA) performance on image generation and video generation tasks and a 1. 38-1. 89x speedup and 1. 97-2. 58x memory reduction in inference compared to existing quantization methods.
1 code implementation • 3 Mar 2025 • Hao Tang, ChenWei Xie, Haiyang Wang, Xiaoyi Bao, Tingyu Weng, Pandeng Li, Yun Zheng, LiWei Wang
Generalist models have achieved remarkable success in both language and vision-language tasks, showcasing the potential of unified modeling.
no code implementations • 25 Feb 2025 • Jin Hou, Hao Tang
The normal operation of power equipment plays a critical role in the power system, making anomaly detection for power equipment highly significant.
no code implementations • 24 Feb 2025 • Fanhu Zeng, Haiyang Guo, Fei Zhu, Li Shen, Hao Tang
With the expansion in data and model size, parameter efficient tuning becomes the common practice for obtaining task-specific models efficiently.
no code implementations • 18 Feb 2025 • Tzu-Quan Lin, Wei-Ping Huang, Hao Tang, Hung-Yi Lee
Existing approaches, such as regularizing weight changes during fine-tuning, may fail to maintain sufficiently high feature similarity with the pre-trained model, and thus could possibly lose cross-task generalization.
no code implementations • 6 Feb 2025 • Guohao Huo, Ruiting Dai, Ling Shao, Hao Tang
To further emulate the human visual system, we introduce the Frequency Domain Enhanced Receptive Field Block (FE-RFB), which integrates WSPM to extract enriched features from the frequency domain.
no code implementations • 4 Feb 2025 • Bin Xie, Hao Tang, Yan Yan, Gady Agam
Segment Anything Model 2 (SAM 2), a prompt-driven foundation model extending SAM to both image and video domains, has shown superior zero-shot performance compared to its predecessor.
no code implementations • 2 Feb 2025 • Bin Xie, Hao Tang, Dawen Cai, Yan Yan, Gady Agam
We design a multi-scale prompt generator combined with the image encoder in SAM to generate auxiliary masks.
no code implementations • 29 Jan 2025 • Yihua Shao, Minxi Yan, Yang Liu, Siyu Chen, Wenjie Chen, Xinwei Long, Ziyang Yan, Lei LI, Chenyu Zhang, Nicu Sebe, Hao Tang, Yan Wang, Hao Zhao, Mengzhu Wang, Jingcai Guo
As a result, our method achieves more accurate LoRA parameter generation for diverse tasks using CVAE.
1 code implementation • 25 Jan 2025 • Hao Tang, Siyue Yu, Jian Pang, Bingfeng Zhang
Then we propose a class-balance Annotation Similarity Filter (ASF) by comparing the synthetic annotation with the response of CLIP to remove the samples related to low-quality annotations.
no code implementations • 24 Jan 2025 • Zhiwei Chen, Hao Tang
Quantum computing is a transformative technology with wide-ranging applications, and efficient quantum circuit generation is crucial for unlocking its full potential.
1 code implementation • CVPR 2025 • Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, Matt Feiszli
Multi-view 3D reconstruction remains a core challenge in computer vision, particularly in applications requiring accurate and scalable representations across diverse perspectives.
no code implementations • 15 Jan 2025 • Hao Tang, Ling Shao, Nicu Sebe, Luc van Gool
Moreover, we propose two novel cross-attention blocks to effectively transfer and update the person's shape and appearance embeddings for mutual improvement.
no code implementations • 8 Jan 2025 • Jun Liu, Zhenglun Kong, Peiyan Dong, Changdi Yang, Xuan Shen, Pu Zhao, Hao Tang, Geng Yuan, Wei Niu, Wenbin Zhang, Xue Lin, Dong Huang, Yanzhi Wang
Although Low-Rank Adaptation (LoRA) is widely used and effective for fine-tuning, we have observed that its scaling factor can limit or even reduce performance as the rank size increases.
no code implementations • 3 Jan 2025 • Rohit Saxena, Hao Tang, Frank Keller
Training transformer-based encoder-decoder models for long document summarization poses a significant challenge due to the quadratic memory consumption during training.
Ranked #1 on
Long-Form Narrative Summarization
on SummScreen
no code implementations • 2 Jan 2025 • Zhaoyu Chen, Haijing Guo, Kaixun Jiang, Jiyuan Fu, Xinyu Zhou, Dingkang Yang, Hao Tang, Bo Li, Wenqiang Zhang
To achieve high transferability, we propose a technique termed Spatial Adversarial Alignment (SAA), which employs an alignment loss and leverages a witness model to fine-tune the surrogate model.
no code implementations • CVPR 2025 • Mingzhen Huang, Fu-Jen Chu, Bugra Tekin, Kevin J. Liang, Haoyu Ma, Weiyao Wang, Xingyu Chen, Pierre Gleize, Hongfei Xue, Siwei Lyu, Kris Kitani, Matt Feiszli, Hao Tang
We introduce HOIGPT, a token-based generative method that unifies 3D hand-object interactions (HOI) perception and generation, offering the first comprehensive solution for captioning and generating high-quality 3D HOI sequences from a diverse range of conditional signals (e. g. text, objects, partial sequences).
no code implementations • 17 Dec 2024 • Lei Xin, Caiyun Huang, Hao Li, Shihong Huang, Yuling Feng, Zhenglun Kong, Zicheng Liu, Siyuan Li, Chang Yu, Fei Shen, Hao Tang
With the rapid development of high-throughput sequencing platforms, an increasing number of omics technologies, such as genomics, metabolomics, and transcriptomics, are being applied to disease genetics research.
1 code implementation • 17 Dec 2024 • Yuqing Wang, Zhongling Huang, Shuxin Yang, Hao Tang, Xiaolan Qiu, Junwei Han, Dingwen Zhang
PolSAR data presents unique challenges due to its rich and complex characteristics.
no code implementations • 2 Dec 2024 • Hao Tang, Zechao Li, Dong Zhang, Shengfeng He, Jinhui Tang
Furthermore, a Modality-aware Dynamic Aggregation Module in the modality-complementary flow dynamically aggregates saliency-related cues from both modality-specific flows.
no code implementations • 26 Nov 2024 • Pirzada Suhail, Hao Tang, Amit Sethi
Neural networks have emerged as powerful tools across various applications, yet their decision-making process often remains opaque, leading to them being perceived as "black boxes."
no code implementations • 26 Nov 2024 • Songtao Li, Hao Tang
This survey offers a comprehensive review of recent advancements in multimodal alignment and fusion within machine learning, spurred by the growing diversity of data types such as text, images, audio, and video.
no code implementations • 25 Nov 2024 • Nonghai Zhang, Hao Tang
When humans read a specific text, they often visualize the corresponding images, and we hope that computers can do the same.
no code implementations • 23 Nov 2024 • Haijie Yang, Zhenyu Zhang, Hao Tang, Jianjun Qian, Jian Yang
In this paper, we propose ConsistentAvatar, a novel framework for fully consistent and high-fidelity talking avatar generation.
no code implementations • 23 Nov 2024 • Hao Tang, Bin Ren, Pingping Wu, Nicu Sebe
In this paper, we present an innovative solution for the challenges of the virtual try-on task: our novel Hierarchical Cross-Attention Network (HCANet).
no code implementations • 16 Nov 2024 • Jiawei Mao, Yu Yang, Xuesong Yin, Ling Shao, Hao Tang
Specifically, we introduce an All-in-One Transformer Block (AiOTB), which adaptively removes all degradations present in a given image by modeling the relationships between all degradations and the image embedding in latent space.
no code implementations • CVPR 2025 • Xiaoyi Liu, Hao Tang
We introduce DiffFNO, a novel diffusion framework for arbitrary-scale super-resolution strengthened by a Weighted Fourier Neural Operator (WFNO).
1 code implementation • 10 Nov 2024 • Zeyu Zhang, Hang Gao, Akide Liu, Qi Chen, Feng Chen, Yiran Wang, Danning Li, Rui Zhao, ZhenMing Li, Zhongwen Zhou, Hao Tang, Bohan Zhuang
The recent Mamba architecture shows promising results in efficiently modeling long and complex sequences, yet two significant challenges remain: Firstly, directly applying Mamba to extended motion generation is ineffective, as the limited capacity of the implicit memory leads to memory decay.
1 code implementation • 10 Nov 2024 • Hao Tang, Junhao Lu, Guoheng Huang, Ming Li, Xuhang Chen, Guo Zhong, Zhengguang Tan, Zinuo Li
In Few-Shot Learning (FSL), traditional metric-based approaches often rely on global metrics to compute similarity.
1 code implementation • 4 Nov 2024 • Wen-Ding Li, Keya Hu, Carter Larsen, Yuqing Wu, Simon Alford, Caleb Woo, Spencer M. Dunn, Hao Tang, Michelangelo Naim, Dat Nguyen, Wei-Long Zheng, Zenna Tavares, Yewen Pu, Kevin Ellis
When learning an input-output mapping from very few examples, is it better to first infer a latent function that explains the examples, or is it better to directly predict new test outputs, e. g. using a neural network?
no code implementations • 30 Oct 2024 • Yihua Shao, Siyu Liang, Zijian Ling, Minxi Yan, Haiyang Liu, Siyu Chen, Ziyang Yan, Chenyu Zhang, Haotong Qin, Michele Magno, Yang Yang, Zhen Lei, Yan Wang, Jingcai Guo, Ling Shao, Hao Tang
Compared to current quantization methods, GWQ can be applied to multiple language models and achieves lower PPL on the WikiText2 and C4 dataset.
no code implementations • 30 Oct 2024 • Yichao Liang, Nishanth Kumar, Hao Tang, Adrian Weller, Joshua B. Tenenbaum, Tom Silver, João F. Henriques, Kevin Ellis
Broadly intelligent agents should form task-specific abstractions that selectively expose the essential elements of a task, while abstracting away the complexity of the raw sensorimotor space.
1 code implementation • 17 Oct 2024 • Zichen Zhu, Hao Tang, Yansi Li, Dingye Liu, Hongshen Xu, Kunyao Lan, Danyang Zhang, Yixuan Jiang, Hao Zhou, Chenrun Wang, Situo Zhang, Liangtai Sun, Yixiao Wang, Yuheng Sun, Lu Chen, Kai Yu
We also present MobBench, a dataset designed for complex mobile interactions.
no code implementations • 14 Oct 2024 • Dejia Xu, Yifan Jiang, Chen Huang, Liangchen Song, Thorsten Gernoth, Liangliang Cao, Zhangyang Wang, Hao Tang
Recent studies have attempted to incorporate camera control into the generation process, but their results are often limited to simple trajectories or lack the ability to generate consistent videos from multiple distinct camera paths for the same scene.
no code implementations • 10 Oct 2024 • Jianing Qi, Hao Tang, Zhigang Zhu
Recent advancements in test time compute, particularly through the use of verifier models, have significantly enhanced the reasoning capabilities of Large Language Models (LLMs).
no code implementations • 9 Oct 2024 • Yunzhi Lin, Yipu Zhao, Fu-Jen Chu, Xingyu Chen, Weiyao Wang, Hao Tang, Patricio A. Vela, Matt Feiszli, Kevin Liang
To address the challenge of short-term object pose tracking in dynamic environments with monocular RGB input, we introduce a large-scale synthetic dataset OmniPose6D, crafted to mirror the diversity of real-world conditions.
1 code implementation • 2 Oct 2024 • Renkai Wu, Xianjin Wang, Pengchen Liang, Zhenyu Zhang, Qing Chang, Hao Tang
In addition, we organize and propose a dehaze dataset for robotic vision in urological surgery (USRobot-Dehaze dataset).
no code implementations • 1 Oct 2024 • Aoming Liang, Zhaoyang Mu, Pengxiao Lin, Cong Wang, Mingming Ge, Ling Shao, Dixia Fan, Hao Tang
This universal controllable approach allows the model to achieve greater accuracy.
no code implementations • 29 Sep 2024 • Jun Liu, Geng Yuan, Weihao Zeng, Hao Tang, Wenbin Zhang, Xue Lin, Xiaolin Xu, Dong Huang, Yanzhi Wang
As a result, the diagnostic results are unreliable by the transfer learning model.
no code implementations • 15 Sep 2024 • Gene-Ping Yang, Hao Tang
Despite the recent advance in self-supervised representations, unsupervised phonetic segmentation remains challenging.
no code implementations • 9 Sep 2024 • Sung-Lin Yeh, Hao Tang
We find that speaker information is sufficiently present in HuBERT discrete units, and that phonetic information is sufficiently present in the residual, showing that vector quantization does not achieve disentanglement.
1 code implementation • 7 Sep 2024 • Tzu-Quan Lin, Guan-Ting Lin, Hung-Yi Lee, Hao Tang
It is, however, desirable to have an approach that can pinpoint exactly a subset of neurons that is responsible for a particular property of speech, being amenable to model pruning and model editing.
no code implementations • 21 Aug 2024 • Zhenyu Lu, Hao Tang
Data-Free Class Incremental Learning (DFCIL) aims to enable models to continuously learn new classes while retraining knowledge of old classes, even when the training data for old classes is unavailable.
1 code implementation • 17 Aug 2024 • Xiaokun Sun, Zhenyu Zhang, Ying Tai, Qian Wang, Hao Tang, Zili Yi, Jian Yang
In this paper, we propose Barbie, a novel framework for generating 3D avatars that can be dressed in diverse and high-quality Barbie-like garments and accessories.
no code implementations • 16 Aug 2024 • Hao Tang, Weiyao Wang, Pierre Gleize, Matt Feiszli
Recent data-driven approaches aim to directly output camera poses, either through regressing the 6DoF camera poses or formulating rotation as a probability distribution.
no code implementations • 25 Jul 2024 • Zhengang Li, Alec Lu, Yanyue Xie, Zhenglun Kong, Mengshu Sun, Hao Tang, Zhong Jia Xue, Peiyan Dong, Caiwen Ding, Yanzhi Wang, Xue Lin, Zhenman Fang
This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs, to design efficient ViT models for hardware implementation while preserving the accuracy.
1 code implementation • 19 Jul 2024 • Yuxuan Zhang, Qing Zhang, Yiren Song, Jichao Zhang, Hao Tang, Jiaming Liu
In the second stage, we specifically designed a Hair Extractor and a Latent IdentityNet to transfer the target hairstyle with highly detailed and high-fidelity to the bald image.
1 code implementation • 14 Jul 2024 • Zeyu Zhang, Akide Liu, Qi Chen, Feng Chen, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang
Text-to-motion generation holds potential for film, gaming, and robotics, yet current methods often prioritize short motion generation, making it challenging to produce long motion sequences effectively: (1) Current methods struggle to handle long motion sequences as a single input due to prohibitively high computational cost; (2) Breaking down the generation of long motion sequences into shorter segments can result in inconsistent transitions and requires interpolation or inpainting, which lacks entire sequence modeling.
1 code implementation • 13 Jul 2024 • Xiaoxu Xu, Yitian Yuan, Jinlong Li, Qiudan Zhang, Zequn Jie, Lin Ma, Hao Tang, Nicu Sebe, Xu Wang
In this paper, we propose 3DSS-VLG, a weakly supervised approach for 3D Semantic Segmentation with 2D Vision-Language Guidance, an alternative approach that a 3D model predicts dense-embedding for each point which is co-embedded with both the aligned image and text spaces from the 2D vision-language model.
no code implementations • 12 Jul 2024 • Anh Thai, Weiyao Wang, Hao Tang, Stefan Stojanov, Matt Feiszli, James M. Rehg
While substantial progress has been made in 2D object part segmentation, the 3D counterpart has received less attention, in part due to the scarcity of annotated 3D datasets, which are expensive to collect.
no code implementations • 8 Jul 2024 • Xuan Wang, Hao Tang, Zhigang Zhu
In this paper, GMC, a general framework is proposed for multistage context learning and utilization, with various deep network architectures for various visual detection tasks.
2 code implementations • 17 Jun 2024 • Jianan Jiang, Hao Tang, Zhilin Jiang, Weiren Yu, Di wu
Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) aims to minimize the distance between sketches and corresponding images in the embedding space.
no code implementations • 13 Jun 2024 • Mukhtar Mohamed, Oli Danyi Liu, Hao Tang, Sharon Goldwater
Self-supervised speech representations can hugely benefit downstream speech technologies, yet the properties that make them useful are still poorly understood.
1 code implementation • 8 Jun 2024 • Tzu-Quan Lin, Hung-Yi Lee, Hao Tang
We introduce Data Adaptive Self-Supervised Early Exit (DAISY), an approach that decides when to exit based on the self-supervised loss, eliminating the need for multiple round of training and fine-tuning.
1 code implementation • 4 Jun 2024 • Xiaofeng Zhang, Yihao Quan, Chen Shen, Xiaosong Yuan, Shaotian Yan, Liang Xie, Wenxiao Wang, Chaochen Gu, Hao Tang, Jieping Ye
Large Vision Language Models (LVLMs) achieve great performance on visual-language reasoning tasks, however, the black-box nature of LVLMs hinders in-depth research on the reasoning mechanism.
1 code implementation • 28 May 2024 • Ziheng Qin, Zhaopan Xu, Yukun Zhou, Zangwei Zheng, Zebang Cheng, Hao Tang, Lei Shang, Baigui Sun, Xiaojiang Peng, Radu Timofte, Hongxun Yao, Kai Wang, Yang You
To tackle this challenge, we propose InfoGrowth, an efficient online algorithm for data cleaning and selection, resulting in a growing dataset that keeps up to date with awareness of cleanliness and diversity.
no code implementations • 26 May 2024 • Hao Tang, Keya Hu, Jin Peng Zhou, Sicheng Zhong, Wei-Long Zheng, Xujie Si, Kevin Ellis
Iteratively improving and repairing source code with large language models (LLMs), known as refinement, has emerged as a popular way of generating programs that would be too complex to construct in one shot.
no code implementations • 13 May 2024 • Oli Danyi Liu, Hao Tang, Naomi Feldman, Sharon Goldwater
Speech perception involves storing and integrating sequentially presented items.
1 code implementation • 9 May 2024 • Hao Tang, Brian Xiao, Wenhao He, Pero Subasic, Avetik R. Harutyunyan, Yao Wang, Fang Liu, Haowei Xu, Ju Li
Machine learning (ML) plays an important role in quantum chemistry, providing fast-to-evaluate predictive models for various properties of molecules.
no code implementations • 24 Apr 2024 • Xin Jiang, Hao Tang, Rui Yan, Jinhui Tang, Zechao Li
This paper presents a meticulous analysis leading to the proposal of practical guidelines to identify subcategory-specific discrepancies and generate discriminative features to design effective FGIR models.
no code implementations • 18 Apr 2024 • Yihua Shao, Yeling Xu, Xinwei Long, Siyu Chen, Ziyang Yan, Yang Yang, Haoting Liu, Yan Wang, Hao Tang, Zhen Lei
In particular, AccidentBlip achieves SOTA performance in both accident detection and prediction tasks on the DeepAccident dataset.
no code implementations • 14 Apr 2024 • Jianyuan Ni, Hao Tang, Syed Tousiful Haque, Yan Yan, Anne H. H. Ngu
We begin by presenting the recent sensor modalities as well as deep learning approaches in HAR.
1 code implementation • 9 Apr 2024 • Ming Tao, Bing-Kun Bao, Hao Tang, YaoWei Wang, Changsheng Xu
3) The story visualization and continuation models are trained and inferred independently, which is not user-friendly.
1 code implementation • CVPR 2024 • Wencan Cheng, Hao Tang, Luc van Gool, Jong Hwan Ko
Extracting keypoint locations from input hand frames, known as 3D hand pose estimation, is a critical task in various human-computer interaction applications.
no code implementations • CVPR 2024 • Haoyu Chen, Hao Tang, Ehsan Adeli, Guoying Zhao
This work is driven by the intuition that the robustness of the model can be enhanced by introducing adversarial samples into the training, leading to a more invulnerable model to the noisy inputs, which even can be further extended to directly handling the real-world data like raw point clouds/scans without intermediate processing.
no code implementations • CVPR 2024 • Gengyu Zhang, Hao Tang, Yan Yan
To address these deficiencies, we propose a versatile diffusion-based approach for both 2D and 3D route planning under partial observability.
no code implementations • CVPR 2024 • Junyi Wu, Weitai Kang, Hao Tang, Yuan Hong, Yan Yan
In contrast, our proposed SaCo offers a reliable faithfulness measurement, establishing a robust metric for interpretations.
1 code implementation • CVPR 2024 • YingJie Xu, Bangzhen Liu, Hao Tang, Bailin Deng, Shengfeng He
We propose a voxel-based optimization framework, ReVoRF, for few-shot radiance fields that strategically address the unreliability in pseudo novel view synthesis.
no code implementations • 26 Mar 2024 • Hao Tang, Lianglun Cheng, Guoheng Huang, Zhengguang Tan, Junhao Lu, Kaihong Wu
Image segmentation holds a vital position in the realms of diagnosis and treatment within the medical domain.
no code implementations • 24 Mar 2024 • Guillaume Thiry, Hao Tang, Radu Timofte, Luc van Gool
Video inpainting tasks have seen significant improvements in recent years with the rise of deep neural networks and, in particular, vision transformers.
no code implementations • CVPR 2024 • Junyi Wu, Bin Duan, Weitai Kang, Hao Tang, Yan Yan
To incorporate the influence of token transformation into interpretation, we propose TokenTM, a novel post-hoc explanation method that utilizes our introduced measurement of token transformation effects.
no code implementations • 21 Mar 2024 • Bin Xie, Hao Tang, Bin Duan, Dawen Cai, Yan Yan
Each pair of auxiliary mask and box prompts, which can solve the requirements of extra prompts, is associated with class label predictions by the sum of the auxiliary classifier token and the learnable global classifier tokens in the mask decoder of SAM to solve the predictions of semantic labels.
no code implementations • 16 Mar 2024 • Rui Wang, Hailong Guo, Jiaming Liu, Huaxia Li, Haibo Zhao, Xu Tang, Yao Hu, Hao Tang, Peipei Li
In this paper, we introduce StableGarment, a unified framework to tackle garment-centric(GC) generation tasks, including GC text-to-image, controllable GC text-to-image, stylized GC text-to-image, and robust virtual try-on.
no code implementations • 16 Mar 2024 • Jun Liu, Zhenglun Kong, Pu Zhao, Changdi Yang, Hao Tang, Xuan Shen, Geng Yuan, Wei Niu, Wenbin Zhang, Xue Lin, Dong Huang, Yanzhi Wang
For example, HyWIA surpasses the cutting-edge LLM-Pruner by an average margin of 2. 82% in accuracy across seven downstream tasks when pruning LLaMA-7B by 50%.
no code implementations • 14 Mar 2024 • Huan-ang Gao, Mingju Gao, Jiaju Li, Wenyi Li, Rong Zhi, Hao Tang, Hao Zhao
Semantic image synthesis (SIS) shows good promises for sensor simulation.
1 code implementation • 14 Mar 2024 • Haiyang Wang, Hao Tang, Li Jiang, Shaoshuai Shi, Muhammad Ferjad Naeem, Hongsheng Li, Bernt Schiele, LiWei Wang
Due to its simple design, this paradigm holds promise for narrowing the architectural gap between vision and language.
Ranked #2 on
Video Captioning
on MSVD-CTN
(using extra training data)
1 code implementation • 12 Mar 2024 • Zeyu Zhang, Akide Liu, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang
Human motion generation stands as a significant pursuit in generative computer vision, while achieving long-sequence and efficient motion generation remains challenging.
Ranked #12 on
Motion Synthesis
on KIT Motion-Language
no code implementations • 8 Mar 2024 • Zichong Meng, Changdi Yang, Jun Liu, Hao Tang, Pu Zhao, Yanzhi Wang
In response to this challenge, our study introduces a novel image editing framework with enhanced generalization robustness by boosting in-context learning capability and unifying language instruction.
1 code implementation • 1 Mar 2024 • Tom Hosking, Hao Tang, Mirella Lapata
We show that HIRO learns an encoding space that is more semantically structured than prior work, and generates summaries that are more representative of the opinions in the input reviews.
no code implementations • 19 Feb 2024 • Hao Tang, Darren Key, Kevin Ellis
We give a model-based agent that builds a Python program representing its knowledge of the world based on its interactions with the environment.
no code implementations • 14 Feb 2024 • Jiancheng Yang, Rui Shi, Liang Jin, Xiaoyang Huang, Kaiming Kuang, Donglai Wei, Shixuan Gu, Jianying Liu, PengFei Liu, Zhizhong Chai, Yongjie Xiao, Hao Chen, Liming Xu, Bang Du, Xiangyi Yan, Hao Tang, Adam Alessio, Gregory Holste, Jiapeng Zhang, Xiaoming Wang, Jianye He, Lixuan Che, Hanspeter Pfister, Ming Li, Bingbing Ni
The resulting FracNet+ demonstrates competitive performance in rib fracture detection, which lays a foundation for further research and development in AI-assisted rib fracture detection and diagnosis.
no code implementations • CVPR 2024 • Weiyao Wang, Pierre Gleize, Hao Tang, Xingyu Chen, Kevin J Liang, Matt Feiszli
Neural Radiance Fields (NeRF) exhibit remarkable performance for Novel View Synthesis (NVS) given a set of 2D images.
no code implementations • 15 Jan 2024 • Hao Tang, Ling Shao, Nicu Sebe, Luc van Gool
Finally, we propose a novel self-guided pre-training method for graph representation learning.
Generative Adversarial Network
Graph Representation Learning
+2
1 code implementation • CVPR 2024 • Yuxuan Zhang, Yiren Song, Jiaming Liu, Rui Wang, Jinpeng Yu, Hao Tang, Huaxia Li, Xu Tang, Yao Hu, Han Pan, Zhongliang Jing
Recent advancements in subject-driven image generation have led to zero-shot generation, yet precise selection and focus on crucial subject representations remain challenging.
no code implementations • 15 Dec 2023 • Xiaofeng Zhang, Zishan Xu, Hao Tang, Chaochen Gu, Wei Chen, Shanying Zhu, Xinping Guan
Low-light image enhancement is a crucial visual task, and many unsupervised methods tend to overlook the degradation of visible information in low-light scenes, which adversely affects the fusion of complementary information and hinders the generation of satisfactory results.
no code implementations • 13 Dec 2023 • Liangchen Song, Liangliang Cao, Jiatao Gu, Yifan Jiang, Junsong Yuan, Hao Tang
In this work, we propose that by incorporating correspondence regularization into diffusion models, the process of 3D editing can be significantly accelerated.
no code implementations • 10 Nov 2023 • Ziye Fang, Xin Jiang, Hao Tang, Zechao Li
In the field of intelligent multimedia analysis, ultra-fine-grained visual categorization (Ultra-FGVC) plays a vital role in distinguishing intricate subcategories within broader categories.
1 code implementation • 7 Nov 2023 • Neng Dong, Shuanglin Yan, Hao Tang, Jinhui Tang, Liyan Zhang
Moreover, as multiple images with the same identity are not accessible in the testing stage, we devise an Information Propagation (IP) mechanism to distill knowledge from the comprehensive representation to that of a single occluded image.
no code implementations • 2 Nov 2023 • Qingsen Yan, Tao Hu, Yuan Sun, Hao Tang, Yu Zhu, Wei Dong, Luc van Gool, Yanning Zhang
To address this challenge, we formulate the HDR deghosting problem as an image generation that leverages LDR features as the diffusion model's condition, consisting of the feature condition generator and the noise predictor.
no code implementations • 26 Oct 2023 • Gene-Ping Yang, Hao Tang
We study two key properties that enable matching, namely, whether cluster centroids of self-supervised representations reduce the variability of phone instances and respect the relationship among phones.
no code implementations • 19 Oct 2023 • Changhao Li, Boning Li, Omar Amer, Ruslan Shaydulin, Shouvanik Chakrabarti, Guoqing Wang, Haowei Xu, Hao Tang, Isidor Schoch, Niraj Kumar, Charles Lim, Ju Li, Paola Cappellaro, Marco Pistoia
Privacy in distributed quantum computing is critical for maintaining confidentiality and protecting the data in the presence of untrusted computing nodes.
no code implementations • 15 Oct 2023 • Jiahao Xia, Gavin Gong, Jiawei Liu, Zhigang Zhu, Hao Tang
In this paper, a Segment Anything Model (SAM)-based pedestrian infrastructure segmentation workflow is designed and optimized, which is capable of efficiently processing multi-sourced geospatial data including LiDAR data and satellite imagery data.
2 code implementations • NeurIPS 2023 • Beining Yang, Kai Wang, Qingyun Sun, Cheng Ji, Xingcheng Fu, Hao Tang, Yang You, JianXin Li
We validate the proposed SGDD across 9 datasets and achieve state-of-the-art results on all of them: for example, on the YelpChi dataset, our approach maintains 98. 6% test accuracy of training on the original graph dataset with 1, 000 times saving on the scale of the graph.
no code implementations • 4 Oct 2023 • Yifan Jiang, Hao Tang, Jen-Hao Rick Chang, Liangchen Song, Zhangyang Wang, Liangliang Cao
Although the fidelity and generalizability are greatly improved, training such a powerful diffusion model requires a vast volume of training data and model parameters, resulting in a notoriously long time and high computational costs.
no code implementations • CVPR 2024 • Sanghwan Kim, Hao Tang, Fisher Yu
Notably, our method incurs negligible computational overhead compared to previous distillation techniques, facilitating straightforward and rapid integration with existing samplers.
no code implementations • 20 Sep 2023 • Yifeng Xiong, Haoyu Ma, Shanlin Sun, Kun Han, Hao Tang, Xiaohui Xie
Starting from the camera pose matrices, LFD transforms them into light field encoding, with the same shape as the reference image, to describe the direction of each ray.
no code implementations • 16 Sep 2023 • Xin Jiang, Hao Tang, Junyao Gao, Xiaoyu Du, Shengfeng He, Zechao Li
In this paper, we aim to fully exploit the capabilities of cross-modal description to tackle FGVC tasks and propose a novel multimodal prompting solution, denoted as MP-FGVC, based on the contrastive language-image pertaining (CLIP) model.
Ranked #13 on
Fine-Grained Image Classification
on NABirds
1 code implementation • 14 Sep 2023 • Zhaochong An, Guolei Sun, Zongwei Wu, Hao Tang, Luc van Gool
Modern approaches have proved the huge potential of addressing semantic segmentation as a mask classification task which is widely used in instance-level segmentation.
1 code implementation • 4 Sep 2023 • Lei Ding, Kun Zhu, Daifeng Peng, Hao Tang, Kuiwu Yang, Lorenzo Bruzzone
In this work, we aim to utilize the strong visual recognition capabilities of VFMs to improve the change detection of high-resolution Remote Sensing Images (RSIs).
3 code implementations • ICCV 2023 • Haiyang Wang, Hao Tang, Shaoshuai Shi, Aoxue Li, Zhenguo Li, Bernt Schiele, LiWei Wang
Jointly processing information from multiple sensors is crucial to achieving accurate and robust perception for reliable autonomous driving systems.
Ranked #8 on
3D Object Detection
on nuScenes
no code implementations • 6 Aug 2023 • Hao Tang, Jun Liu, Shuanglin Yan, Rui Yan, Zechao Li, Jinhui Tang
Due to the scarcity of manually annotated data required for fine-grained video understanding, few-shot fine-grained (FS-FG) action recognition has gained significant attention, with the aim of classifying novel fine-grained action categories with only a few labeled instances.
no code implementations • 31 Jul 2023 • Elia Peruzzo, Willi Menapace, Vidit Goel, Federica Arrigoni, Hao Tang, Xingqian Xu, Arman Chopikyan, Nikita Orlov, Yuxiao Hu, Humphrey Shi, Nicu Sebe, Elisa Ricci
This paper advances the state of the art in this emerging research domain by proposing the first approach for Interactive NP.
no code implementations • 23 Jul 2023 • Shanlin Sun, Thanh-Tung Le, Chenyu You, Hao Tang, Kun Han, Haoyu Ma, Deying Kong, Xiangyi Yan, Xiaohui Xie
We present Hybrid-CSR, a geometric deep-learning model that combines explicit and implicit shape representations for cortical surface reconstruction.
1 code implementation • 22 Jul 2023 • Hao Tang, Guolei Sun, Nicu Sebe, Luc van Gool
To tackle 2), we design an effective module to selectively highlight class-dependent feature maps according to the original semantic layout to preserve the semantic information.
1 code implementation • 14 Jul 2023 • Neng Dong, Liyan Zhang, Shuanglin Yan, Hao Tang, Jinhui Tang
Occlusion perturbation presents a significant challenge in person re-identification (re-ID), and existing methods that rely on external visual cues require additional computational resources and only consider the issue of missing information caused by occlusion.
1 code implementation • 21 Jun 2023 • Chengchao Shen, Dawei Liu, Hao Tang, Zhe Qu, Jianxin Wang
In this paper, we propose a novel image mix method, PatchMix, for contrastive learning in Vision Transformer (ViT), to model inter-instance similarities among images.
2 code implementations • 17 Jun 2023 • Qihan Zhao, Xiaofeng Zhang, Hao Tang, Chaochen Gu, Shanying Zhu
Image restoration is a low-level visual task, and most CNN methods are designed as black boxes, lacking transparency and intrinsic aesthetics.
no code implementations • 3 Jun 2023 • Ramon Sanabria, Ondrej Klejch, Hao Tang, Sharon Goldwater
Acoustic word embeddings are typically created by training a pooling function using pairs of word-like units.
no code implementations • 1 Jun 2023 • Linhui Dai, Hong Liu, Pinhao Song, Hao Tang, Runwei Ding, Shengquan Li
The key to addressing these challenges is to focus the model on obtaining more discriminative information.
1 code implementation • 24 May 2023 • Tong Xu, Micol Spitale, Hao Tang, Lu Liu, Hatice Gunes, Siyang Song
This means that we approach this problem by considering the generation of a distribution of the listener's appropriate facial reactions instead of multiple different appropriate facial reactions, i. e., 'many' appropriate facial reaction labels are summarised as 'one' distribution label during training.
no code implementations • 21 May 2023 • Oli Liu, Hao Tang, Sharon Goldwater
Self-supervised speech representations are known to encode both speaker and phonetic information, but how they are distributed in the high-dimensional space remains largely unexplored.
1 code implementation • 19 May 2023 • Tom Hosking, Hao Tang, Mirella Lapata
We propose a method for unsupervised opinion summarization that encodes sentences from customer reviews into a hierarchical discrete latent space, then identifies common opinions based on the frequency of their encodings.
no code implementations • CVPR 2023 • Hao Tang, Songhua Liu, Tianwei Lin, Shaoli Huang, Fu Li, Dongliang He, Xinchao Wang
On the other hand, different from the vanilla version, we adopt a learnable scaling operation on content features before content-style feature interaction, which better preserves the original similarity between a pair of content features while ensuring the stylization quality.
no code implementations • CVPR 2023 • Qingsen Yan, Song Zhang, Weiye Chen, Hao Tang, Yu Zhu, Jinqiu Sun, Luc van Gool, Yanning Zhang
In this work, we propose a novel semi-supervised approach to realize few-shot HDR imaging via two stages of training, called SSHDR.
no code implementations • 6 Apr 2023 • Xiangyi Yan, Junayed Naushad, Chenyu You, Hao Tang, Shanlin Sun, Kun Han, Haoyu Ma, James Duncan, Xiaohui Xie
In this paper, we propose a novel contrastive learning framework that integrates Localized Region Contrast (LRC) to enhance existing self-supervised pre-training methods for medical image segmentation.
no code implementations • CVPR 2023 • Guofeng Mei, Hao Tang, Xiaoshui Huang, Weijie Wang, Juan Liu, Jian Zhang, Luc van Gool, Qiang Wu
Deep point cloud registration methods face challenges to partial overlaps and rely on labeled data.
no code implementations • CVPR 2023 • Hao Tang, Zhenyu Zhang, Humphrey Shi, Bo Li, Ling Shao, Nicu Sebe, Radu Timofte, Luc van Gool
We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations in an end-to-end fashion for the challenging graph-constrained house generation task.
no code implementations • 9 Mar 2023 • Hao Tang, Aref Miri Rekavandi, Dharjinder Rooprai, Girish Dwivedi, Frank Sanfilippo, Farid Boussaid, Mohammed Bennamoun
This study investigates the effectiveness of Explainable Artificial Intelligence (XAI) techniques in predicting suicide risks and identifying the dominant causes for such behaviours.
1 code implementation • CVPR 2023 • Xuan Shen, Yaohua Wang, Ming Lin, Yilun Huang, Hao Tang, Xiuyu Sun, Yanzhi Wang
To this end, a novel framework termed Mathematical Architecture Design for Deep CNN (DeepMAD) is proposed to design high-performance CNN models in a principled way.
Ranked #1 on
Neural Architecture Search
on ImageNet
1 code implementation • 3 Feb 2023 • Chao Yu, Jiaxuan Gao, Weilin Liu, Botian Xu, Hao Tang, Jiaqi Yang, Yu Wang, Yi Wu
A crucial limitation of this framework is that every policy in the pool is optimized w. r. t.
2 code implementations • CVPR 2023 • Ming Tao, Bing-Kun Bao, Hao Tang, Changsheng Xu
The complex scene understanding ability of CLIP enables the discriminator to accurately assess the image quality.
Ranked #4 on
Text-to-Image Generation
on CUB
1 code implementation • 24 Jan 2023 • Baptiste Chopin, Hao Tang, Mohamed Daoudi
The generation of natural human motion interactions is a hot topic in computer vision and computer animation.
no code implementations • ICCV 2023 • Jianbing Wu, Hong Liu, Yuxin Su, Wei Shi, Hao Tang
Owing to the large distribution gap between the heterogeneous data in Visible-Infrared Person Re-identification (VI Re-ID), we point out that existing paradigms often suffer from the inter-modal semantic misalignment issue and thus fail to align and compare local details properly.
no code implementations • CVPR 2023 • Changdi Yang, Pu Zhao, Yanyu Li, Wei Niu, Jiexiong Guan, Hao Tang, Minghai Qin, Bin Ren, Xue Lin, Yanzhi Wang
With the ever-increasing popularity of edge devices, it is necessary to implement real-time segmentation on the edge for autonomous driving and many other applications.
1 code implementation • 12 Dec 2022 • Hui Wei, Zhixiang Wang, Xuemei Jia, Yinqiang Zheng, Hao Tang, Shin'ichi Satoh, Zheng Wang
Adversarial attacks on thermal infrared imaging expose the risk of related applications.
no code implementations • 7 Dec 2022 • Hao Ding, Changchang Sun, Hao Tang, Dawen Cai, Yan Yan
Recently, due to the increasing requirements of medical imaging applications and the professional requirements of annotating medical images, few-shot learning has gained increasing attention in the medical image semantic segmentation field.
1 code implementation • 19 Nov 2022 • Zhenglun Kong, Haoyu Ma, Geng Yuan, Mengshu Sun, Yanyue Xie, Peiyan Dong, Xin Meng, Xuan Shen, Hao Tang, Minghai Qin, Tianlong Chen, Xiaolong Ma, Xiaohui Xie, Zhangyang Wang, Yanzhi Wang
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization.
1 code implementation • 17 Nov 2022 • Tzu-Quan Lin, Tsung-Huan Yang, Chun-Yao Chang, Kuang-Ming Chen, Tzu-hsun Feng, Hung-Yi Lee, Hao Tang
Transformer-based self-supervised models have achieved remarkable success in speech processing, but their large size and high inference cost present significant challenges for real-world deployment.
1 code implementation • 17 Nov 2022 • Tzu-Quan Lin, Hung-Yi Lee, Hao Tang
Self-supervised models have had great success in learning speech representations that can generalize to various downstream tasks.
no code implementations • 12 Nov 2022 • Hao Tang, Lei Ding, Songsong Wu, Bin Ren, Nicu Sebe, Paolo Rota
The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically.
1 code implementation • 12 Nov 2022 • Hao Tang, Ling Shao, Philip H. S. Torr, Nicu Sebe
To further capture the change in pose of each part more precisely, we propose a novel part-aware bipartite graph reasoning (PBGR) block to decompose the task of reasoning the global structure transformation with a bipartite graph into learning different local transformations for different semantic body/face parts.
1 code implementation • 2 Nov 2022 • Xuan Shen, Zhenglun Kong, Minghai Qin, Peiyan Dong, Geng Yuan, Xin Meng, Hao Tang, Xiaolong Ma, Yanzhi Wang
That is, there exists a subset of input image patches such that a ViT can be trained from scratch by using only this subset of patches and achieve similar accuracy to the ViTs trained by using all image patches.
no code implementations • 29 Oct 2022 • Sung-Lin Yeh, Hao Tang
While discrete latent variable models have had great success in self-supervised learning, most models assume that frames are independent.
no code implementations • 28 Oct 2022 • Ramon Sanabria, Hao Tang, Sharon Goldwater
Given the strong results of self-supervised models on various tasks, there have been surprisingly few studies exploring self-supervised representations for acoustic word embeddings (AWE), fixed-dimensional vectors representing variable-length spoken word segments.
1 code implementation • 27 Oct 2022 • Chin-Yun Yu, Sung-Lin Yeh, György Fazekas, Hao Tang
Moreover, by coupling the proposed sampling method with an unconditional DM, i. e., a DM with no auxiliary inputs to its noise predictor, we can generalize it to a wide range of SR setups.
no code implementations • 19 Oct 2022 • Peng Xing, Hao Tang, Jinhui Tang, Zechao Li
However, existing KDAD methods suffer from two main limitations: 1) the student network can effortlessly replicate the teacher network's representations, and 2) the features of the teacher network serve solely as a ``reference standard" and are not fully leveraged.
no code implementations • 13 Oct 2022 • Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe, Paola Garcia, Hung-Yi Lee, Hao Tang
Subsampling while training self-supervised models not only improves the overall performance on downstream tasks under certain frame rates, but also brings significant speed-up in inference.
1 code implementation • 10 Oct 2022 • Yitong Xia, Hao Tang, Radu Timofte, Luc van Gool
NeRFmm is the Neural Radiance Fields (NeRF) that deal with Joint Optimization tasks, i. e., reconstructing real-world scenes and registering camera parameters simultaneously.
1 code implementation • 4 Oct 2022 • Zican Zha, Hao Tang, Yunlian Sun, Jinhui Tang
To address this challenging task, we propose a two-stage background suppression and foreground alignment framework, which is composed of a background activation suppression (BAS) module, a foreground object alignment (FOA) module, and a local-to-local (L2L) similarity metric.
1 code implementation • 30 Sep 2022 • Hui Wei, Hao Tang, Xuemei Jia, Zhixiang Wang, Hanxun Yu, Zhubo Li, Shin'ichi Satoh, Luc van Gool, Zheng Wang
Building upon this foundation, we uncover the pervasive role of artifacts carrying adversarial perturbations in the physical world.
2 code implementations • 16 Sep 2022 • Haoyu Ma, Zhe Wang, Yifei Chen, Deying Kong, Liangjian Chen, Xingwei Liu, Xiangyi Yan, Hao Tang, Xiaohui Xie
In this paper, we propose the token-Pruned Pose Transformer (PPT) for 2D human pose estimation, which can locate a rough human mask and performs self-attention only within selected tokens.
Ranked #20 on
3D Human Pose Estimation
on Human3.6M
(using extra training data)
1 code implementation • 5 Sep 2022 • Hao Tang, Nicu Sebe
We propose a simple yet powerful Landmark guided Generative Adversarial Network (LandmarkGAN) for the facial expression-to-expression translation using a single image, which is an important and challenging task in computer vision since the expression-to-expression translation is a non-linear and non-aligned problem.
Facial Expression Translation
Generative Adversarial Network
+1
no code implementations • 30 Aug 2022 • Shuanglin Yan, Hao Tang, Liyan Zhang, Jinhui Tang
Moreover, existing methods seldom consider the information inequality problem between modalities caused by image-specific information.
1 code implementation • 26 Aug 2022 • Jichao Zhang, Aliaksandr Siarohin, Yahui Liu, Hao Tang, Nicu Sebe, Wei Wang
Generative Neural Radiance Fields (GNeRF)-based 3D-aware GANs have showcased remarkable prowess in crafting high-fidelity images while upholding robust 3D consistency, particularly face generation.
1 code implementation • 25 Aug 2022 • Jianbing Wu, Hong Liu, Wei Shi, Hao Tang, Jingwen Guo
To mitigate the resolution degradation issue and mine identity-sensitive cues from human faces, we propose to restore the missing facial details using prior facial knowledge, which is then propagated to a smaller network.
no code implementations • 19 Aug 2022 • Pan Xie, Qipeng Zhang, Taiyi Peng, Hao Tang, Yao Du, Zexian Li
Our approach focuses on the transformation of sign gloss sequences into their corresponding sign pose sequences (G2P).
no code implementations • 10 Aug 2022 • Zhengang Li, Mengshu Sun, Alec Lu, Haoyu Ma, Geng Yuan, Yanyue Xie, Hao Tang, Yanyu Li, Miriam Leeser, Zhangyang Wang, Xue Lin, Zhenman Fang
Compared with state-of-the-art ViT quantization work (algorithmic approach only without hardware acceleration), our quantization achieves 0. 47% to 1. 36% higher Top-1 accuracy under the same bit-width.
1 code implementation • 25 Jul 2022 • Yushu Wu, Yifan Gong, Pu Zhao, Yanyu Li, Zheng Zhan, Wei Niu, Hao Tang, Minghai Qin, Bin Ren, Yanzhi Wang
Instead of measuring the speed on mobile devices at each iteration during the search process, a speed model incorporated with compiler optimizations is leveraged to predict the inference latency of the SR block with various width configurations for faster convergence.
1 code implementation • 21 Jul 2022 • JieZhang Cao, Jingyun Liang, Kai Zhang, Wenguan Wang, Qin Wang, Yulun Zhang, Hao Tang, Luc van Gool
These issues can be alleviated by a cascade of three separate sub-tasks, including video deblurring, frame interpolation, and super-resolution, which, however, would fail to capture the spatial and temporal correlations among video sequences.
Ranked #7 on
Video Super-Resolution
on REDS4- 4x upscaling
1 code implementation • 21 Jul 2022 • Guolei Sun, Yun Liu, Hao Tang, Ajad Chhatkuli, Le Zhang, Luc van Gool
The essence of video semantic segmentation (VSS) is how to leverage temporal information for prediction.
Ranked #4 on
Video Semantic Segmentation
on VSPW
no code implementations • 17 Jul 2022 • Bin Xie, Hao Tang, Bin Duan, Dawen Cai, Yan Yan
Brain vessel image segmentation can be used as a promising biomarker for better prevention and treatment of different diseases.
1 code implementation • 16 Jul 2022 • Daqian Shi, Xiaolei Diao, Hao Tang, Xiaomin Li, Hao Xing, Hao Xu
SENet aims to preserve the structural consistency of the character and normalize complex noise.
1 code implementation • 16 Jul 2022 • Daqian Shi, Xiaolei Diao, Lida Shi, Hao Tang, Yang Chi, Chuntao Li, Hao Xu
Degraded images commonly exist in the general sources of character images, leading to unsatisfactory character recognition results.
no code implementations • 12 Jul 2022 • Xiaolei Diao, Daqian Shi, Hao Tang, Qiang Shen, Yanzeng Li, Lei Wu, Hao Xu
The long-tail effect is a common issue that limits the performance of deep learning models on real-world datasets.
1 code implementation • 9 Jul 2022 • Bin Ren, Hao Tang, Yiming Wang, Xia Li, Wei Wang, Nicu Sebe
For semantic-guided cross-view image translation, it is crucial to learn where to sample pixels from the source view image and where to reallocate them guided by the target view semantic map, especially when there is little overlap or drastic view difference between the source and target images.
1 code implementation • 7 Jul 2022 • Zhan Chen, Hong Liu, Tianyu Guo, Zhengyan Chen, Pinhao Song, Hao Tang
First, SkeleMix utilizes the topological information of skeleton data to mix two skeleton sequences by randomly combing the cropped skeleton fragments (the trimmed view) with the remaining skeleton sequences (the truncated view).
1 code implementation • 4 Jul 2022 • Baptiste Chopin, Hao Tang, Naima Otberdout, Mohamed Daoudi, Nicu Sebe
To address this limitation, we propose a novel interaction Transformer (InterFormer) consisting of a Transformer network with both temporal and spatial attention.
1 code implementation • 1 Jul 2022 • Jichao Zhang, Jingjing Chen, Hao Tang, Enver Sangineto, Peng Wu, Yan Yan, Nicu Sebe, Wei Wang
Solving this problem using an unsupervised method remains an open problem, especially for high-resolution face images in the wild, which are not easy to annotate with gaze and head pose labels.
1 code implementation • 29 Jun 2022 • Sherwin Bahmani, Jeong Joon Park, Despoina Paschalidou, Hao Tang, Gordon Wetzstein, Leonidas Guibas, Luc van Gool, Radu Timofte
Generative models have emerged as an essential building block for many image synthesis and editing tasks.
1 code implementation • 13 Jun 2022 • Wenhao Li, Mengyuan Liu, Hong Liu, Tianyu Guo, Ti Wang, Hao Tang, Nicu Sebe
To the best of our knowledge, this is the first MLP-Like architecture for 3D human pose estimation in a single frame and a video sequence.
Ranked #67 on
3D Human Pose Estimation
on Human3.6M
no code implementations • 13 Jun 2022 • Hao Tang, Kevin Ellis
Toward combining inductive reasoning with perception abilities, we develop techniques for neurosymbolic program synthesis where perceptual input is first parsed by neural nets into a low-dimensional interpretable representation, which is then processed by a synthesized program.
no code implementations • 7 Jun 2022 • Shanlin Sun, Kun Han, Chenyu You, Hao Tang, Deying Kong, Junayed Naushad, Xiangyi Yan, Haoyu Ma, Pooya Khosravi, James S. Duncan, Xiaohui Xie
Traditional methods for image registration are primarily optimization-driven, finding the optimal deformations that maximize the similarity between two images.
no code implementations • 2 Jun 2022 • Yanyu Li, Xuan Shen, Geng Yuan, Jiexiong Guan, Wei Niu, Hao Tang, Bin Ren, Yanzhi Wang
In this work we demonstrate real-time portrait stylization, specifically, translating self-portrait into cartoon or anime style on mobile devices.
1 code implementation • 2 Jun 2022 • Ming Tao, Bing-Kun Bao, Hao Tang, Fei Wu, Longhui Wei, Qi Tian
To solve these limitations, we propose: (i) a Dynamic Editing Block (DEBlock) which composes different editing modules dynamically for various editing requirements.
1 code implementation • 25 May 2022 • Linhui Dai, Hong Liu, Hao Tang, Zhiwei Wu, Pinhao Song
Comprehensive experiments on several challenging datasets show that our method achieves superior performance on the AOOD task.
no code implementations • 25 Apr 2022 • Gene-Ping Yang, Hao Tang
Attention mechanism in sequence-to-sequence models is designed to model the alignments between acoustic features and output tokens in speech recognition.
no code implementations • 2 Apr 2022 • Zeyong Wei, Honghua Chen, Hao Tang, Qian Xie, Mingqiang Wei, Jun Wang
The shape of circle is one of fundamental geometric primitives of man-made engineering objects.
1 code implementation • 29 Mar 2022 • Sung-Lin Yeh, Hao Tang
While several self-supervised approaches for learning discrete speech representation have been proposed, it is unclear how these seemingly similar approaches relate to each other.