no code implementations • 26 Mar 2025 • Tao Wu, Tie Luo
To address this, we propose Feature Permutation Attack (FPA), a zero-FLOP, parameter-free method that enhances adversarial transferability across diverse architectures.
no code implementations • 14 Mar 2025 • Ruiqian Li, Siyuan Shen, Suan Xia, Ziheng Wang, Xingyue Peng, Chengxuan Song, Yingsheng Zhu, Tao Wu, Shiying Li, Jingyi Yu
High frame rates, for example, can be achieved by reducing either per-point scanning time or scanning density, but at the cost of lowering the information density at individual frames.
no code implementations • 7 Mar 2025 • Ling Team, Binwei Zeng, Chao Huang, Chao Zhang, Changxin Tian, Cong Chen, dingnan jin, Feng Yu, Feng Zhu, Feng Yuan, Fakang Wang, Gangshan Wang, Guangyao Zhai, HaiTao Zhang, Huizhong Li, Jun Zhou, Jia Liu, Junpeng Fang, Junjie Ou, Jun Hu, Ji Luo, Ji Zhang, Jian Liu, Jian Sha, Jianxue Qian, Jiewei Wu, Junping Zhao, Jianguo Li, Jubao Feng, Jingchao Di, Junming Xu, Jinghua Yao, Kuan Xu, Kewei Du, Longfei Li, Lei Liang, Lu Yu, Li Tang, Lin Ju, Peng Xu, Qing Cui, Song Liu, Shicheng Li, Shun Song, Song Yan, Tengwei Cai, Tianyi Chen, Ting Guo, Ting Huang, Tao Feng, Tao Wu, Wei Wu, Xiaolu Zhang, Xueming Yang, Xin Zhao, Xiaobo Hu, Xin Lin, Yao Zhao, Yilong Wang, Yongzhen Guo, Yuanyuan Wang, Yue Yang, Yang Cao, Yuhao Fu, Yi Xiong, Yanzhe Li, Zhe Li, Zhiqiang Zhang, Ziqi Liu, ZhaoXin Huan, Zujie Wen, Zhenhang Sun, Zhuoxuan Du, Zhengyu He
Ultimately, our experimental findings demonstrate that a 300B MoE LLM can be effectively trained on lower-performance devices while achieving comparable performance to models of a similar scale, including dense and MoE models.
no code implementations • 14 Feb 2025 • Teng Li, Guangcong Zheng, Rui Jiang, Shuigenzhan, Tao Wu, Yehao Lu, Yining Lin, Xi Li
Recent advancements in camera-trajectory-guided image-to-video generation offer higher precision and better support for complex camera control compared to text-based approaches.
no code implementations • 22 Jan 2025 • Jingyuan Chen, Tao Wu, Wei Ji, Fei Wu
Large language models (LLMs) have emerged as powerful tools in natural language processing (NLP), showing a promising future of artificial generated intelligence (AGI).
no code implementations • 8 Jan 2025 • Lin Yuan, Kai Liang, Xiong Li, Tao Wu, Nannan Wang, Xinbo Gao
However, many still face limitations in visual quality and often overlook the potential to recover the original face from the anonymized version, which can be valuable in specific contexts such as image forensics.
no code implementations • 31 Dec 2024 • Zhenpeng Huang, Xinhao Li, Jiaqi Li, Jing Wang, Xiangyu Zeng, Cheng Liang, Tao Wu, Xi Chen, Liang Li, LiMin Wang
Despite the lower computational cost and higher efficiency, VideoChat-Online outperforms existing state-of-the-art offline and online models across popular offline video benchmarks and OVBench, demonstrating the effectiveness of our model architecture and training strategy.
1 code implementation • 29 Dec 2024 • Yulin Fei, Yuhui Gao, Xingyuan Xian, Xiaojin Zhang, Tao Wu, Wei Chen
With the rise of multimodal large language models, accurately extracting and understanding textual information from video content, referred to as video based optical character recognition (Video OCR), has become a crucial capability.
no code implementations • 27 Dec 2024 • Tao Wu, Yong Zhang, Xiaodong Cun, Zhongang Qi, Junfu Pu, Huanzhang Dou, Guangcong Zheng, Ying Shan, Xi Li
Zero-shot customized video generation has gained significant attention due to its substantial application potential.
no code implementations • 14 Dec 2024 • Tao Wu, Chuhao Zhou, Yen Heng Wong, Lin Gu, Jianfei Yang
Additionally, we also propose a 'Self-Correction' prompting mechanism and a new evaluation metric to enhance and measure both noise detection capability and answer quality.
1 code implementation • 5 Dec 2024 • Jun Zhang, Desen Meng, Ji Qi, Zhenpeng Huang, Tao Wu, LiMin Wang
In this paper, we propose to build efficient MLLMs by leveraging the Mixture-of-Depths (MoD) mechanism, where each transformer decoder layer selects essential vision tokens to process while skipping redundant ones.
1 code implementation • 21 Oct 2024 • Guangcong Zheng, Teng Li, Rui Jiang, Yehao Lu, Tao Wu, Xi Li
We innovatively associate the quality of a condition with its ability to reduce uncertainty and interpret noisy cross-frame features as a form of noisy condition.
no code implementations • 24 Sep 2024 • Xingping Xian, Jianlu Liu, Chao Wang, Tao Wu, Shaojie Qiao, Xiaochuan Tang, Qun Liu
First, GISExplainer defines a causal attribution mechanism that considers the game-theoretic interaction of multi-granularity coalitions in candidate explanatory subgraph to quantify the causal effect of an edge on the prediction.
no code implementations • 28 Aug 2024 • Zihan Huang, Tao Wu, Wang Lin, Shengyu Zhang, Jingyuan Chen, Fei Wu
With the rapid advancement of large language models, there has been a growing interest in their capabilities in mathematical reasoning.
no code implementations • 23 Aug 2024 • Tao Wu, Mengze Li, Jingyuan Chen, Wei Ji, Wang Lin, Jinyang Gao, Kun Kuang, Zhou Zhao, Fei Wu
By involving the bidirectional semantic guidance between different images in the visual-token extraction process, SAM aims to enhance the preservation of linking information for coherent analysis and align the semantics of different images before feeding them into LLM.
1 code implementation • 23 Aug 2024 • Tao Wu, Yong Zhang, Xintao Wang, Xianpan Zhou, Guangcong Zheng, Zhongang Qi, Ying Shan, Xi Li
In this paper, we propose CustomCrafter, a novel framework that preserves the model's motion generation and conceptual combination abilities without additional video and fine-tuning to recovery.
no code implementations • 11 Aug 2024 • Guoan Xu, Wenfeng Huang, Tao Wu, Ligeng Chen, Wenjing Jia, Guangwei Gao, Xiatian Zhu, Stuart Perry
Semantic segmentation involves assigning a specific category to each pixel in an image.
no code implementations • 10 Jul 2024 • Guoan Xu, Wenjing Jia, Tao Wu, Ligeng Chen, Guangwei Gao
In this paper, we introduce HAFormer, a model that combines the hierarchical features extraction ability of CNNs with the global dependency modeling capability of Transformers to tackle lightweight semantic segmentation challenges.
no code implementations • 20 Jun 2024 • Tao Wu, Canyixing Cui, Xingping Xian, Shaojie Qiao, Chao Wang, Lin Yuan, Shui Yu
Graph neural networks (GNNs) have achieved tremendous success, but recent studies have shown that GNNs are vulnerable to adversarial attacks, which significantly hinders their use in safety-critical scenarios.
no code implementations • 19 Jun 2024 • Tao Wu, Xinwen Cao, Chao Wang, Shaojie Qiao, Xingping Xian, Lin Yuan, Canyixing Cui, Yanbing Liu
Numerous adversarial defense methods on GNNs are proposed to address the problem of adversarial attacks.
no code implementations • 17 May 2024 • Tao Wu, Shuqiu Ge, Jie Qin, Gangshan Wu, LiMin Wang
Open-vocabulary spatio-temporal action detection (OV-STAD) requires training a model on a limited set of base classes with box and label supervision, which is expected to yield good generalization performance on novel action classes.
no code implementations • 15 Apr 2024 • Tao Wu, Mengqi Cao, Ziteng Gao, Gangshan Wu, LiMin Wang
First, we present a query-based adaptive feature sampling module, which endows the detector with the flexibility of mining a group of discriminative features from the entire spatio-temporal domain.
1 code implementation • CVPR 2024 • Tao Wu, Runyu He, Gangshan Wu, LiMin Wang
We hope that SportsHHI can stimulate research on human interaction understanding in videos and promote the development of spatio-temporal context modeling techniques in video visual relation detection.
no code implementations • 15 Mar 2024 • Tao Wu, XueWei Li, Zhongang Qi, Di Hu, Xintao Wang, Ying Shan, Xi Li
Controllable spherical panoramic image generation holds substantial applicative potential across a variety of domains. However, it remains a challenging task due to the inherent spherical distortion and geometry characteristics, resulting in low-quality content generation. In this paper, we introduce a novel framework of SphereDiffusion to address these unique challenges, for better generating high-quality and precisely controllable spherical panoramic images. For the spherical distortion characteristic, we embed the semantics of the distorted object with text encoding, then explicitly construct the relationship with text-object correspondence to better use the pre-trained knowledge of the planar images. Meanwhile, we employ a deformable technique to mitigate the semantic deviation in latent space caused by spherical distortion. For the spherical geometry characteristic, in virtue of spherical rotation invariance, we improve the data diversity and optimization objectives in the training process, enabling the model to better learn the spherical geometry characteristic. Furthermore, we enhance the denoising process of the diffusion model, enabling it to effectively use the learned geometric characteristic to ensure the boundary continuity of the generated images. With these specific techniques, experiments on Structured3D dataset show that SphereDiffusion significantly improves the quality of controllable spherical image generation and relatively reduces around 35% FID on average.
1 code implementation • 21 Dec 2023 • Tao Wu, Tie Luo, Donald C. Wunsch
However, as training progresses, the non-linearity of the loss landscape increases, rendering one-step gradient ascent less effective.
1 code implementation • 20 Dec 2023 • Tao Wu, Tie Luo, Donald C. Wunsch
The transferability of adversarial examples is of central importance to transfer-based black-box adversarial attacks.
no code implementations • 10 Sep 2023 • Guoan Xu, Wenjing Jia, Tao Wu, Ligeng Chen
In contrast to the abundant research focusing on large-scale models, the progress in lightweight semantic segmentation appears to be advancing at a comparatively slower pace.
no code implementations • 18 Jul 2023 • Lin Yuan, Kai Liang, Xiao Pu, Yan Zhang, Jiaxu Leng, Tao Wu, Nannan Wang, Xinbo Gao
This paper proposes a novel paradigm for facial privacy protection that unifies multiple characteristics including anonymity, diversity, reversibility and security within a single lightweight framework.
no code implementations • 9 Jul 2023 • Tao Wu, Tie Luo, Donald C. Wunsch
Adversarial examples (AE) with good transferability enable practical black-box attacks on diverse target models, where insider knowledge about the target models is not required.
1 code implementation • 6 Jun 2023 • XueWei Li, Tao Wu, Zhongang Qi, Gaoang Wang, Ying Shan, Xi Li
Experimental results on Stanford2D3D Panoramic datasets show that SGAT4PASS significantly improves performance and robustness, with approximately a 2% increase in mIoU, and when small 3D disturbances occur in the data, the stability of our performance is improved by an order of magnitude.
Ranked #4 on
Semantic Segmentation
on Stanford2D3D Panoramic
no code implementations • CVPR 2023 • Tao Wu, Mengqi Cao, Ziteng Gao, Gangshan Wu, LiMin Wang
STMixer is based on two core designs.
1 code implementation • 15 Mar 2023 • Jinxiang Lai, Siqian Yang, Wenlong Wu, Tao Wu, Guannan Jiang, Xi Wang, Jun Liu, Bin-Bin Gao, Wei zhang, Yuan Xie, Chengjie Wang
Then we derive two specific attention modules, named SpatialFormer Semantic Attention (SFSA) and SpatialFormer Target Attention (SFTA), to enhance the target object regions while reduce the background distraction.
1 code implementation • 31 Dec 2022 • Xingping Xian, Tao Wu, Xiaoke Ma, Shaojie Qiao, Yabin Shao, Chao Wang, Lin Yuan, Yu Wu
Instead of sampling positive and negative links and heuristically computing the features of their enclosing subgraphs, GraphLP utilizes the feature learning ability of deep-learning models to automatically extract the structural patterns of graphs for link prediction under the assumption that real-world graphs are not locally isolated.
no code implementations • 10 Nov 2022 • Yijia Shao, Mengyu Zhou, Yifan Zhong, Tao Wu, Hongwei Han, Shi Han, Gideon Huang, Dongmei Zhang
To assist form designers, in this work we present FormLM to model online forms (by enhancing pre-trained language model with form structural information) and recommend form creation ideas (including question / options recommendations and block type suggestion).
no code implementations • 28 Sep 2022 • Tao Wu, Tie Luo, Donald Wunsch
To begin with, we investigate the efficacy of transfer learning in IIR, by comparing off-the-shelf features learned by a pre-trained deep neural network (DNN) classifier with features learned by a CL model.
1 code implementation • CVPR 2022 • Bo wang, Tao Wu, Minfeng Zhu, Peng Du
In particular, the stuff layouts can take amorphous shapes and fill up the missing regions left out by the instance layouts.
Ranked #2 on
Layout-to-Image Generation
on Visual Genome 128x128
no code implementations • 29 Dec 2021 • Zhengqing Pan, Ruiqian Li, Tian Gao, Zi Wang, Ping Liu, Siyuan Shen, Tao Wu, Jingyi Yu, Shiying Li
There has been an increasing interest in deploying non-line-of-sight (NLOS) imaging systems for recovering objects behind an obstacle.
no code implementations • 9 Oct 2021 • Xiaolong Zheng, Deyun Zhou, Na Li, Yu Lei, Tao Wu, Maoguo Gong
In the focus search strategy, if there is no knowledge source benefit the optimization of a task, then all knowledge sources in the task's pool are forbidden to be utilized except the task, which helps to improve the performance of the proposed algorithm.
2 code implementations • 10 Sep 2021 • Zhenzhi Wang, LiMin Wang, Tao Wu, TianHao Li, Gangshan Wu
Instead, from a perspective on temporal grounding as a metric-learning problem, we present a Mutual Matching Network (MMN), to directly model the similarity between language queries and video moments in a joint embedding space.
Ranked #3 on
Temporal Sentence Grounding
on Charades-STA
no code implementations • CVPR 2021 • Peixian Hong, Tao Wu, AnCong Wu, Xintong Han, Wei-Shi Zheng
Recently, person re-identification (Re-ID) has achieved great progress.
Ranked #5 on
Person Re-Identification
on VC-Clothes
1 code implementation • 27 Aug 2020 • Scott Graham, Jun-Ki Min, Tao Wu
The purpose of this work is to highlight the content of the Microsoft Recommenders repository and show how it can be used to reduce the time involved in developing recommender systems.
no code implementations • 7 Aug 2020 • Tao Wu, Ellie Ka-In Chio, Heng-Tze Cheng, Yu Du, Steffen Rendle, Dima Kuzmin, Ritesh Agarwal, Li Zhang, John Anderson, Sarvjeet Singh, Tushar Chandra, Ed H. Chi, Wen Li, Ankit Kumar, Xiang Ma, Alex Soares, Nitin Jindal, Pei Cao
In light of these problems, we observed that most online content platforms have both a search and a recommender system that, while having heterogeneous input spaces, can be connected through their common output item space and a shared semantic representation.
no code implementations • 6 May 2020 • Li Wang, Dawei Zhao, Tao Wu, Hao Fu, Zhiyu Wang, Liang Xiao, Xin Xu, Bin Dai
3D moving object detection is one of the most critical tasks in dynamic scene analysis.
1 code implementation • 27 Feb 2020 • Zhenzhang Ye, Thomas Möllenhoff, Tao Wu, Daniel Cremers
Structured convex optimization on weighted graphs finds numerous applications in machine learning and computer vision.
no code implementations • 3 Jan 2020 • Kishaloy Halder, Heng-Tze Cheng, Ellie Ka In Chio, Georgios Roumpos, Tao Wu, Ritesh Agarwal
Users issue queries to Search Engines, and try to find the desired information in the results produced.
no code implementations • 4 Dec 2019 • Pierre Bréchet, Tao Wu, Thomas Möllenhoff, Daniel Cremers
We tackle the challenge of disentangled representation learning in generative adversarial networks (GANs) from the perspective of regularized optimal transport (OT).
1 code implementation • ICCV 2019 • Bjoern Haefner, Zhenzhang Ye, Maolin Gao, Tao Wu, Yvain Quéau, Daniel Cremers
Photometric stereo (PS) techniques nowadays remain constrained to an ideal laboratory setup where modeling and calibration of lighting is amenable.
no code implementations • 27 Mar 2019 • Emanuel Laude, Tao Wu, Daniel Cremers
In this work, we consider nonconvex composite problems that involve inf-convolution with a Legendre function, which gives rise to an anisotropic generalization of the proximal mapping and Moreau-envelope.
1 code implementation • 31 Jan 2019 • Yuesong Shen, Tao Wu, Csaba Domokos, Daniel Cremers
Probabilistic graphical models are traditionally known for their successes in generative modeling.
no code implementations • 6 Aug 2018 • Lingni Ma, Jörg Stückler, Tao Wu, Daniel Cremers
Dense pixelwise prediction such as semantic segmentation is an up-to-date challenge for deep convolutional neural networks (CNNs).
no code implementations • 20 May 2018 • Tao Wu, Shaojie Qiao, Xingping Xian, Xi-Zhao Wang, Wei Wang, Yanbing Liu
In addition, the model is capable of measuring the importance of microscopic network elements, i. e., nodes and links, in terms of network regularity thereby allowing us to regulate the reconstructability of networks based on them.
no code implementations • 16 Jan 2018 • Thomas Möllenhoff, Zhenzhang Ye, Tao Wu, Daniel Cremers
We present a novel preconditioning technique for proximal optimization methods that relies on graph algorithms to construct effective preconditioners.
no code implementations • 4 Jul 2017 • Yvain Quéau, Bastien Durix, Tao Wu, Daniel Cremers, François Lauze, Jean-Denis Durou
The second one directly recovers the depth, by formulating photometric stereo as a system of PDEs which are partially linearized using image ratios.
no code implementations • CVPR 2017 • Yvain Queau, Tao Wu, Francois Lauze, Jean-Denis Durou, Daniel Cremers
This paper tackles the photometric stereo problem in the presence of inaccurate lighting, obtained either by calibration or by an uncalibrated photometric stereo method.
1 code implementation • 20 Apr 2017 • Tao Wu, David Gleich
In this paper we propose the retrospective higher-order Markov process (RHOMP) as a low-parameter model for such sequences.
1 code implementation • NeurIPS 2016 • Tao Wu, Austin R. Benson, David F. Gleich
Spectral clustering and co-clustering are well-known techniques in data analysis, and recent work has extended spectral clustering to square, symmetric tensors and hypermatrices derived from a network.
1 code implementation • 15 Aug 2016 • Tao Wu, David F. Gleich
A sufficient condition for the method to work is $\| H \| < 1$, which greatly limits the usability of this method.
no code implementations • 23 May 2013 • Wenkui Ding, Tao Wu, Tao Qin, Tie-Yan Liu
Previous studies have shown that the pure Price Of Anarchy (POA) of GSP is 1. 25 when there are two ad slots and 1. 259 when three ad slots.
Computer Science and Game Theory