Search Results for author: Tao Wu

Found 58 papers, 19 papers with code

Enabling Heterogeneous Adversarial Transferability via Feature Permutation Attacks

no code implementations26 Mar 2025 Tao Wu, Tie Luo

To address this, we propose Feature Permutation Attack (FPA), a zero-FLOP, parameter-free method that enhances adversarial transferability across diverse architectures.

TransiT: Transient Transformer for Non-line-of-sight Videography

no code implementations14 Mar 2025 Ruiqian Li, Siyuan Shen, Suan Xia, Ziheng Wang, Xingyue Peng, Chengxuan Song, Yingsheng Zhu, Tao Wu, Shiying Li, Jingyi Yu

High frame rates, for example, can be achieved by reducing either per-point scanning time or scanning density, but at the cost of lowering the information density at individual frames.

Autonomous Navigation Transfer Learning

RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control

no code implementations14 Feb 2025 Teng Li, Guangcong Zheng, Rui Jiang, Shuigenzhan, Tao Wu, Yehao Lu, Yining Lin, Xi Li

Recent advancements in camera-trajectory-guided image-to-video generation offer higher precision and better support for complex camera control compared to text-based approaches.

3D Scene Reconstruction Depth Estimation +1

WisdomBot: Tuning Large Language Models with Artificial Intelligence Knowledge

no code implementations22 Jan 2025 Jingyuan Chen, Tao Wu, Wei Ji, Fei Wu

Large language models (LLMs) have emerged as powerful tools in natural language processing (NLP), showing a promising future of artificial generated intelligence (AGI).

Retrieval

iFADIT: Invertible Face Anonymization via Disentangled Identity Transform

no code implementations8 Jan 2025 Lin Yuan, Kai Liang, Xiong Li, Tao Wu, Nannan Wang, Xinbo Gao

However, many still face limitations in visual quality and often overlook the potential to recover the original face from the anonymized version, which can be valuable in specific contexts such as image forensics.

Disentanglement Face Anonymization +1

Online Video Understanding: OVBench and VideoChat-Online

no code implementations31 Dec 2024 Zhenpeng Huang, Xinhao Li, Jiaqi Li, Jing Wang, Xiangyu Zeng, Cheng Liang, Tao Wu, Xi Chen, Liang Li, LiMin Wang

Despite the lower computational cost and higher efficiency, VideoChat-Online outperforms existing state-of-the-art offline and online models across popular offline video benchmarks and OVBench, demonstrating the effectiveness of our model architecture and training strategy.

Autonomous Driving Question Answering +1

Do Current Video LLMs Have Strong OCR Abilities? A Preliminary Study

1 code implementation29 Dec 2024 Yulin Fei, Yuhui Gao, Xingyuan Xian, Xiaojin Zhang, Tao Wu, Wei Chen

With the rise of multimodal large language models, accurately extracting and understanding textual information from video content, referred to as video based optical character recognition (Video OCR), has become a crucial capability.

Motion Detection Optical Character Recognition +2

NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries

no code implementations14 Dec 2024 Tao Wu, Chuhao Zhou, Yen Heng Wong, Lin Gu, Jianfei Yang

Additionally, we also propose a 'Self-Correction' prompting mechanism and a new evaluation metric to enhance and measure both noise detection capability and answer quality.

Benchmarking Embodied Question Answering +2

p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay

1 code implementation5 Dec 2024 Jun Zhang, Desen Meng, Ji Qi, Zhenpeng Huang, Tao Wu, LiMin Wang

In this paper, we propose to build efficient MLLMs by leveraging the Mixture-of-Depths (MoD) mechanism, where each transformer decoder layer selects essential vision tokens to process while skipping redundant ones.

Decoder

CamI2V: Camera-Controlled Image-to-Video Diffusion Model

1 code implementation21 Oct 2024 Guangcong Zheng, Teng Li, Rui Jiang, Yehao Lu, Tao Wu, Xi Li

We innovatively associate the quality of a condition with its ability to reduce uncertainty and interpret noisy cross-frame features as a form of noisy condition.

GISExplainer: On Explainability of Graph Neural Networks via Game-theoretic Interaction Subgraphs

no code implementations24 Sep 2024 Xingping Xian, Jianlu Liu, Chao Wang, Tao Wu, Shaojie Qiao, Xiaochuan Tang, Qun Liu

First, GISExplainer defines a causal attribution mechanism that considers the game-theoretic interaction of multi-granularity coalitions in candidate explanatory subgraph to quantify the causal effect of an edge on the prediction.

Computational Efficiency Node Classification

AutoGeo: Automating Geometric Image Dataset Creation for Enhanced Geometry Understanding

no code implementations28 Aug 2024 Zihan Huang, Tao Wu, Wang Lin, Shengyu Zhang, Jingyuan Chen, Fei Wu

With the rapid advancement of large language models, there has been a growing interest in their capabilities in mathematical reasoning.

Mathematical Reasoning

Semantic Alignment for Multimodal Large Language Models

no code implementations23 Aug 2024 Tao Wu, Mengze Li, Jingyuan Chen, Wei Ji, Wang Lin, Jinyang Gao, Kun Kuang, Zhou Zhao, Fei Wu

By involving the bidirectional semantic guidance between different images in the visual-token extraction process, SAM aims to enhance the preservation of linking information for coherent analysis and align the semantics of different images before feeding them into LLM.

Large Language Model Visual Storytelling

CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities

1 code implementation23 Aug 2024 Tao Wu, Yong Zhang, Xintao Wang, Xianpan Zhou, Guangcong Zheng, Zhongang Qi, Ying Shan, Xi Li

In this paper, we propose CustomCrafter, a novel framework that preserves the model's motion generation and conceptual combination abilities without additional video and fine-tuning to recovery.

Denoising Motion Generation +1

HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation

no code implementations10 Jul 2024 Guoan Xu, Wenjing Jia, Tao Wu, Ligeng Chen, Guangwei Gao

In this paper, we introduce HAFormer, a model that combines the hierarchical features extraction ability of CNNs with the global dependency modeling capability of Transformers to tackle lightweight semantic segmentation challenges.

Semantic Segmentation

Explainable AI Security: Exploring Robustness of Graph Neural Networks to Adversarial Attacks

no code implementations20 Jun 2024 Tao Wu, Canyixing Cui, Xingping Xian, Shaojie Qiao, Chao Wang, Lin Yuan, Shui Yu

Graph neural networks (GNNs) have achieved tremendous success, but recent studies have shown that GNNs are vulnerable to adversarial attacks, which significantly hinders their use in safety-critical scenarios.

Adversarial Robustness

Open-Vocabulary Spatio-Temporal Action Detection

no code implementations17 May 2024 Tao Wu, Shuqiu Ge, Jie Qin, Gangshan Wu, LiMin Wang

Open-vocabulary spatio-temporal action detection (OV-STAD) requires training a model on a limited set of base classes with box and label supervision, which is expected to yield good generalization performance on novel action classes.

Fine-Grained Action Detection Video Understanding

STMixer: A One-Stage Sparse Action Detector

no code implementations15 Apr 2024 Tao Wu, Mengqi Cao, Ziteng Gao, Gangshan Wu, LiMin Wang

First, we present a query-based adaptive feature sampling module, which endows the detector with the flexibility of mining a group of discriminative features from the entire spatio-temporal domain.

Action Detection

SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos

1 code implementation CVPR 2024 Tao Wu, Runyu He, Gangshan Wu, LiMin Wang

We hope that SportsHHI can stimulate research on human interaction understanding in videos and promote the development of spatio-temporal context modeling techniques in video visual relation detection.

Graph Generation Relation +4

SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model

no code implementations15 Mar 2024 Tao Wu, XueWei Li, Zhongang Qi, Di Hu, Xintao Wang, Ying Shan, Xi Li

Controllable spherical panoramic image generation holds substantial applicative potential across a variety of domains. However, it remains a challenging task due to the inherent spherical distortion and geometry characteristics, resulting in low-quality content generation. In this paper, we introduce a novel framework of SphereDiffusion to address these unique challenges, for better generating high-quality and precisely controllable spherical panoramic images. For the spherical distortion characteristic, we embed the semantics of the distorted object with text encoding, then explicitly construct the relationship with text-object correspondence to better use the pre-trained knowledge of the planar images. Meanwhile, we employ a deformable technique to mitigate the semantic deviation in latent space caused by spherical distortion. For the spherical geometry characteristic, in virtue of spherical rotation invariance, we improve the data diversity and optimization objectives in the training process, enabling the model to better learn the spherical geometry characteristic. Furthermore, we enhance the denoising process of the diffusion model, enabling it to effectively use the learned geometric characteristic to ensure the boundary continuity of the generated images. With these specific techniques, experiments on Structured3D dataset show that SphereDiffusion significantly improves the quality of controllable spherical image generation and relatively reduces around 35% FID on average.

Denoising Diversity +2

CR-SAM: Curvature Regularized Sharpness-Aware Minimization

1 code implementation21 Dec 2023 Tao Wu, Tie Luo, Donald C. Wunsch

However, as training progresses, the non-linearity of the loss landscape increases, rendering one-step gradient ascent less effective.

LRS: Enhancing Adversarial Transferability through Lipschitz Regularized Surrogate

1 code implementation20 Dec 2023 Tao Wu, Tie Luo, Donald C. Wunsch

The transferability of adversarial examples is of central importance to transfer-based black-box adversarial attacks.

Adversarial Robustness

MFPNet: Multi-scale Feature Propagation Network For Lightweight Semantic Segmentation

no code implementations10 Sep 2023 Guoan Xu, Wenjing Jia, Tao Wu, Ligeng Chen

In contrast to the abundant research focusing on large-scale models, the progress in lightweight semantic segmentation appears to be advancing at a comparatively slower pace.

Decoder Segmentation +1

PRO-Face S: Privacy-preserving Reversible Obfuscation of Face Images via Secure Flow

no code implementations18 Jul 2023 Lin Yuan, Kai Liang, Xiao Pu, Yan Zhang, Jiaxu Leng, Tao Wu, Nannan Wang, Xinbo Gao

This paper proposes a novel paradigm for facial privacy protection that unifies multiple characteristics including anonymity, diversity, reversibility and security within a single lightweight framework.

Diversity Privacy Preserving

GNP Attack: Transferable Adversarial Examples via Gradient Norm Penalty

no code implementations9 Jul 2023 Tao Wu, Tie Luo, Donald C. Wunsch

Adversarial examples (AE) with good transferability enable practical black-box attacks on diverse target models, where insider knowledge about the target models is not required.

SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation

1 code implementation6 Jun 2023 XueWei Li, Tao Wu, Zhongang Qi, Gaoang Wang, Ying Shan, Xi Li

Experimental results on Stanford2D3D Panoramic datasets show that SGAT4PASS significantly improves performance and robustness, with approximately a 2% increase in mIoU, and when small 3D disturbances occur in the data, the stability of our performance is improved by an order of magnitude.

Semantic Segmentation

SpatialFormer: Semantic and Target Aware Attentions for Few-Shot Learning

1 code implementation15 Mar 2023 Jinxiang Lai, Siqian Yang, Wenlong Wu, Tao Wu, Guannan Jiang, Xi Wang, Jun Liu, Bin-Bin Gao, Wei zhang, Yuan Xie, Chengjie Wang

Then we derive two specific attention modules, named SpatialFormer Semantic Attention (SFSA) and SpatialFormer Target Attention (SFTA), to enhance the target object regions while reduce the background distraction.

Few-Shot Learning

Generative Graph Neural Networks for Link Prediction

1 code implementation31 Dec 2022 Xingping Xian, Tao Wu, Xiaoke Ma, Shaojie Qiao, Yabin Shao, Chao Wang, Lin Yuan, Yu Wu

Instead of sampling positive and negative links and heuristically computing the features of their enclosing subgraphs, GraphLP utilizes the feature learning ability of deep-learning models to automatically extract the structural patterns of graphs for link prediction under the assumption that real-world graphs are not locally isolated.

Link Prediction Prediction

FormLM: Recommending Creation Ideas for Online Forms by Modelling Semantic and Structural Information

no code implementations10 Nov 2022 Yijia Shao, Mengyu Zhou, Yifan Zhong, Tao Wu, Hongwei Han, Shi Han, Gideon Huang, Dongmei Zhang

To assist form designers, in this work we present FormLM to model online forms (by enhancing pre-trained language model with form structural information) and recommend form creation ideas (including question / options recommendations and block type suggestion).

Form Language Modeling +1

Learning Deep Representations via Contrastive Learning for Instance Retrieval

no code implementations28 Sep 2022 Tao Wu, Tie Luo, Donald Wunsch

To begin with, we investigate the efficacy of transfer learning in IIR, by comparing off-the-shelf features learned by a pre-trained deep neural network (DNN) classifier with features learned by a CL model.

Contrastive Learning Image Retrieval +2

Onsite Non-Line-of-Sight Imaging via Online Calibrations

no code implementations29 Dec 2021 Zhengqing Pan, Ruiqian Li, Tian Gao, Zi Wang, Ping Liu, Siyuan Shen, Tao Wu, Jingyi Yu, Shiying Li

There has been an increasing interest in deploying non-line-of-sight (NLOS) imaging systems for recovering objects behind an obstacle.

Object

Self-adaptive Multi-task Particle Swarm Optimization

no code implementations9 Oct 2021 Xiaolong Zheng, Deyun Zhou, Na Li, Yu Lei, Tao Wu, Maoguo Gong

In the focus search strategy, if there is no knowledge source benefit the optimization of a task, then all knowledge sources in the task's pool are forbidden to be utilized except the task, which helps to improve the performance of the proposed algorithm.

Evolutionary Algorithms Transfer Learning

Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

2 code implementations10 Sep 2021 Zhenzhi Wang, LiMin Wang, Tao Wu, TianHao Li, Gangshan Wu

Instead, from a perspective on temporal grounding as a metric-learning problem, we present a Mutual Matching Network (MMN), to directly model the similarity between language queries and video moments in a joint embedding space.

Metric Learning Representation Learning +2

Microsoft Recommenders: Tools to Accelerate Developing Recommender Systems

1 code implementation27 Aug 2020 Scott Graham, Jun-Ki Min, Tao Wu

The purpose of this work is to highlight the content of the Microsoft Recommenders repository and show how it can be used to reduce the time involved in developing recommender systems.

Multi-Domain Recommender Systems

Zero-Shot Heterogeneous Transfer Learning from Recommender Systems to Cold-Start Search Retrieval

no code implementations7 Aug 2020 Tao Wu, Ellie Ka-In Chio, Heng-Tze Cheng, Yu Du, Steffen Rendle, Dima Kuzmin, Ritesh Agarwal, Li Zhang, John Anderson, Sarvjeet Singh, Tushar Chandra, Ed H. Chi, Wen Li, Ankit Kumar, Xiang Ma, Alex Soares, Nitin Jindal, Pei Cao

In light of these problems, we observed that most online content platforms have both a search and a recommender system that, while having heterogeneous input spaces, can be connected through their common output item space and a shared semantic representation.

Information Retrieval Recommendation Systems +2

Optimization of Graph Total Variation via Active-Set-based Combinatorial Reconditioning

1 code implementation27 Feb 2020 Zhenzhang Ye, Thomas Möllenhoff, Tao Wu, Daniel Cremers

Structured convex optimization on weighted graphs finds numerous applications in machine learning and computer vision.

Modeling Information Need of Users in Search Sessions

no code implementations3 Jan 2020 Kishaloy Halder, Heng-Tze Cheng, Ellie Ka In Chio, Georgios Roumpos, Tao Wu, Ritesh Agarwal

Users issue queries to Search Engines, and try to find the desired information in the results produced.

Informative GANs via Structured Regularization of Optimal Transport

no code implementations4 Dec 2019 Pierre Bréchet, Tao Wu, Thomas Möllenhoff, Daniel Cremers

We tackle the challenge of disentangled representation learning in generative adversarial networks (GANs) from the perspective of regularized optimal transport (OT).

Representation Learning

Variational Uncalibrated Photometric Stereo under General Lighting

1 code implementation ICCV 2019 Bjoern Haefner, Zhenzhang Ye, Maolin Gao, Tao Wu, Yvain Quéau, Daniel Cremers

Photometric stereo (PS) techniques nowadays remain constrained to an ideal laboratory setup where modeling and calibration of lighting is amenable.

Optimization of Inf-Convolution Regularized Nonconvex Composite Problems

no code implementations27 Mar 2019 Emanuel Laude, Tao Wu, Daniel Cremers

In this work, we consider nonconvex composite problems that involve inf-convolution with a Legendre function, which gives rise to an anisotropic generalization of the proximal mapping and Moreau-envelope.

Detailed Dense Inference with Convolutional Neural Networks via Discrete Wavelet Transform

no code implementations6 Aug 2018 Lingni Ma, Jörg Stückler, Tao Wu, Daniel Cremers

Dense pixelwise prediction such as semantic segmentation is an up-to-date challenge for deep convolutional neural networks (CNNs).

Decoder Semantic Segmentation

Network Reconstruction and Controlling Based on Structural Regularity Analysis

no code implementations20 May 2018 Tao Wu, Shaojie Qiao, Xingping Xian, Xi-Zhao Wang, Wei Wang, Yanbing Liu

In addition, the model is capable of measuring the importance of microscopic network elements, i. e., nodes and links, in terms of network regularity thereby allowing us to regulate the reconstructability of networks based on them.

Combinatorial Preconditioners for Proximal Algorithms on Graphs

no code implementations16 Jan 2018 Thomas Möllenhoff, Zhenzhang Ye, Tao Wu, Daniel Cremers

We present a novel preconditioning technique for proximal optimization methods that relies on graph algorithms to construct effective preconditioners.

BIG-bench Machine Learning

LED-based Photometric Stereo: Modeling, Calibration and Numerical Solution

no code implementations4 Jul 2017 Yvain Quéau, Bastien Durix, Tao Wu, Daniel Cremers, François Lauze, Jean-Denis Durou

The second one directly recovers the depth, by formulating photometric stereo as a system of PDEs which are partially linearized using image ratios.

A Non-Convex Variational Approach to Photometric Stereo Under Inaccurate Lighting

no code implementations CVPR 2017 Yvain Queau, Tao Wu, Francois Lauze, Jean-Denis Durou, Daniel Cremers

This paper tackles the photometric stereo problem in the presence of inaccurate lighting, obtained either by calibration or by an uncalibrated photometric stereo method.

Retrospective Higher-Order Markov Processes for User Trails

1 code implementation20 Apr 2017 Tao Wu, David Gleich

In this paper we propose the retrospective higher-order Markov process (RHOMP) as a low-parameter model for such sequences.

General Tensor Spectral Co-clustering for Higher-Order Data

1 code implementation NeurIPS 2016 Tao Wu, Austin R. Benson, David F. Gleich

Spectral clustering and co-clustering are well-known techniques in data analysis, and recent work has extended spectral clustering to square, symmetric tensors and hypermatrices derived from a network.

Clustering

Multi-way Monte Carlo Method for Linear Systems

1 code implementation15 Aug 2016 Tao Wu, David F. Gleich

A sufficient condition for the method to work is $\| H \| < 1$, which greatly limits the usability of this method.

Pure Price of Anarchy for Generalized Second Price Auction

no code implementations23 May 2013 Wenkui Ding, Tao Wu, Tao Qin, Tie-Yan Liu

Previous studies have shown that the pure Price Of Anarchy (POA) of GSP is 1. 25 when there are two ad slots and 1. 259 when three ad slots.

Computer Science and Game Theory

Cannot find the paper you are looking for? You can Submit a new open access paper.