Search Results for author: Dongfang Liu

Found 46 papers, 22 papers with code

Re-Imagining Multimodal Instruction Tuning: A Representation View

1 code implementation2 Mar 2025 Yiyang Liu, James Chenhao Liang, Ruixiang Tang, Yugyung Lee, Majid Rabbani, Sohail Dianat, Raghuveer Rao, Lifu Huang, Dongfang Liu, Qifan Wang, Cheng Han

Multimodal instruction tuning has proven to be an effective strategy for achieving zero-shot generalization by fine-tuning pre-trained Large Multimodal Models (LMMs) with instruction-following data.

Instruction Following MME +2

KGIF: Optimizing Relation-Aware Recommendations with Knowledge Graph Information Fusion

no code implementations7 Jan 2025 Dong Hyun Jeon, Wenbo Sun, Houbing Herbert Song, Dongfang Liu, Velasquez Alvaro, Yixin Chloe Xie, Shuteng Niu

This study introduces the Knowledge Graph Attention Network with Information Fusion (KGIF), a specialized framework designed to merge entity and relation embeddings explicitly through a tailored self-attention mechanism.

Attribute Collaborative Filtering +5

Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics

1 code implementation18 Nov 2024 Taowen Wang, Cheng Han, James Chenhao Liang, Wenhao Yang, Dongfang Liu, Luna Xinyu Zhang, Qifan Wang, Jiebo Luo, Ruixiang Tang

In particular, we introduce two untargeted attack objectives that leverage spatial foundations to destabilize robotic actions, and a targeted attack objective that manipulates the robotic trajectory.

Vision-Language-Action

Target-driven Attack for Large Language Models

no code implementations9 Nov 2024 Chong Zhang, Mingyu Jin, Dong Shu, Taowen Wang, Dongfang Liu, Xiaobo Jin

To solve this problem, we propose our target-driven black-box attack method to maximize the KL divergence between the conditional probabilities of the clean text and the attack text to redefine the attack's goal.

Adversarial Text Language Modeling +2

Visual Fourier Prompt Tuning

1 code implementation2 Nov 2024 Runjia Zeng, Cheng Han, Qifan Wang, Chunshu Wu, Tong Geng, Lifu Huang, Ying Nian Wu, Dongfang Liu

To address this challenge, we draw inspiration from human visual cognition, and propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models.

Visual Prompt Tuning

M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning

2 code implementations24 Sep 2024 Taowen Wang, Yiyang Liu, James Chenhao Liang, Junhan Zhao, Yiming Cui, Yuning Mao, Shaoliang Nie, Jiahao Liu, Fuli Feng, Zenglin Xu, Cheng Han, Lifu Huang, Qifan Wang, Dongfang Liu

Instruction tuning has emerged as an effective strategy for achieving zero-shot generalization by finetuning pretrained models on diverse multimodal tasks.

Zero-shot Generalization

Visual Agents as Fast and Slow Thinkers

1 code implementation16 Aug 2024 Guangyan Sun, Mingyu Jin, Zhenting Wang, Cheng-Long Wang, Siqi Ma, Qifan Wang, Tong Geng, Ying Nian Wu, Yongfeng Zhang, Dongfang Liu

With this novel design, we advocate a flexible system, hierarchical reasoning capabilities, and a transparent decision-making pipeline, all of which contribute to its ability to emulate human-like cognitive processes in visual intelligence.

Question Answering Reasoning Segmentation +1

Radiance Field Learners As UAV First-Person Viewers

no code implementations10 Aug 2024 Liqi Yan, Qifan Wang, Junhan Zhao, Qiang Guan, Zheng Tang, Jianhui Zhang, Dongfang Liu

First-Person-View (FPV) holds immense potential for revolutionizing the trajectory of Unmanned Aerial Vehicles (UAVs), offering an exhilarating avenue for navigating complex building structures.

NeRF

Self-supervised Adversarial Training of Monocular Depth Estimation against Physical-World Attacks

1 code implementation9 Jun 2024 Zhiyuan Cheng, Cheng Han, James Liang, Qifan Wang, Xiangyu Zhang, Dongfang Liu

Our experiments with two representative MDE networks demonstrate improved robustness against various adversarial attacks, with minimal impact on benign performance.

Adversarial Robustness Autonomous Driving +2

ProMotion: Prototypes As Motion Learners

no code implementations CVPR 2024 Yawen Lu, Dongfang Liu, Qifan Wang, Cheng Han, Yiming Cui, Zhiwen Cao, Xueling Zhang, Yingjie Victor Chen, Heng Fan

We capitalize on a dual mechanism involving the feature denoiser and the prototypical learner to decipher the intricacies of motion.

Prototypical Transformer as Unified Motion Learners

no code implementations3 Jun 2024 Cheng Han, Yawen Lu, Guohao Sun, James C. Liang, Zhiwen Cao, Qifan Wang, Qiang Guan, Sohail A. Dianat, Raghuveer M. Rao, Tong Geng, Zhiqiang Tao, Dongfang Liu

In this work, we introduce the Prototypical Transformer (ProtoFormer), a general and unified framework that approaches various motion tasks from a prototype perspective.

Object Tracking Representation Learning +1

SSGA-Net: Stepwise Spatial Global-local Aggregation Networks for for Autonomous Driving

no code implementations29 May 2024 Yiming Cui, Cheng Han, Dongfang Liu

Spatial global-local aggregation fuses the local information from the neighboring frames and global semantics from the current frame to eliminate the feature degradation; 3).

Autonomous Driving Object +2

Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning?

1 code implementation23 Jan 2024 Cheng Han, Qifan Wang, Yiming Cui, Wenguan Wang, Lifu Huang, Siyuan Qi, Dongfang Liu

As the scale of vision models continues to grow, the emergence of Visual Prompt Tuning (VPT) as a parameter-efficient transfer learning technique has gained attention due to its superior performance compared to traditional full-finetuning.

Transfer Learning Visual Prompt Tuning

Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning

1 code implementation1 Dec 2023 Shaohua Dong, Yunhe Feng, Qing Yang, Yan Huang, Dongfang Liu, Heng Fan

Existing approaches often fully fine-tune a dual-branch encoder-decoder framework with a complicated feature fusion strategy for achieving multimodal semantic segmentation, which is training-costly due to the massive parameter updates in feature extraction and fusion.

Decoder object-detection +8

ClusterFormer: Clustering As A Universal Visual Learner

1 code implementation22 Sep 2023 James C. Liang, Yiming Cui, Qifan Wang, Tong Geng, Wenguan Wang, Dongfang Liu

This paper presents CLUSTERFORMER, a universal vision model that is based on the CLUSTERing paradigm with TransFORMER.

Clustering Image Classification +7

E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning

1 code implementation ICCV 2023 Cheng Han, Qifan Wang, Yiming Cui, Zhiwen Cao, Wenguan Wang, Siyuan Qi, Dongfang Liu

Specifically, we introduce a set of learnable key-value prompts and visual prompts into self-attention and input layers, respectively, to improve the effectiveness of model fine-tuning.

Visual Prompt Tuning

CLUSTSEG: Clustering for Universal Segmentation

1 code implementation3 May 2023 James Liang, Tianfei Zhou, Dongfang Liu, Wenguan Wang

We present CLUSTSEG, a general, transformer-based framework that tackles different image segmentation tasks (i. e., superpixel, semantic, instance, and panoptic) through a unified neural clustering scheme.

Instance Segmentation Panoptic Segmentation +3

Fusion is Not Enough: Single Modal Attacks on Fusion Models for 3D Object Detection

1 code implementation28 Apr 2023 Zhiyuan Cheng, Hongjun Choi, James Liang, Shiwei Feng, Guanhong Tao, Dongfang Liu, Michael Zuzak, Xiangyu Zhang

We argue that the weakest link of fusion models depends on their most vulnerable modality, and propose an attack framework that targets advanced camera-LiDAR fusion-based 3D object detection models through camera-only adversarial attacks.

3D Object Detection Autonomous Driving +2

TransFlow: Transformer as Flow Learner

no code implementations CVPR 2023 Yawen Lu, Qifan Wang, Siqi Ma, Tong Geng, Yingjie Victor Chen, Huaijin Chen, Dongfang Liu

Optical flow is an indispensable building block for various important computer vision tasks, including motion estimation, object tracking, and disparity measurement.

Motion Estimation object-detection +4

Exploiting Logic Locking for a Neural Trojan Attack on Machine Learning Accelerators

no code implementations12 Apr 2023 Hongye Xu, Dongfang Liu, Cory Merkel, Michael Zuzak

If an incorrect secret key is used, a set of deterministic errors is produced in locked modules, restricting unauthorized use.

Learning Equivariant Segmentation with Instance-Unique Querying

1 code implementation3 Oct 2022 Wenguan Wang, James Liang, Dongfang Liu

Prevalent state-of-the-art instance segmentation methods fall into a query-based scheme, in which instance masks are derived by querying the image feature using a set of instance-aware embeddings.

Instance Segmentation Semantic Segmentation

Visual Recognition with Deep Nearest Centroids

1 code implementation15 Sep 2022 Wenguan Wang, Cheng Han, Tianfei Zhou, Dongfang Liu

We devise deep nearest centroids (DNC), a conceptually elegant yet surprisingly effective network for large-scale visual recognition, by revisiting Nearest Centroids, one of the most classic and simple classifiers.

Decision Making Image Classification +1

Towards Unbiased Label Distribution Learning for Facial Pose Estimation Using Anisotropic Spherical Gaussian

no code implementations19 Aug 2022 Zhiwen Cao, Dongfang Liu, Qifan Wang, Yingjie Chen

In this paper, we propose an Anisotropic Spherical Gaussian (ASG)-based LDL approach for facial pose estimation.

Pose Estimation

Physical Attack on Monocular Depth Estimation with Optimal Adversarial Patches

2 code implementations11 Jul 2022 Zhiyuan Cheng, James Liang, Hongjun Choi, Guanhong Tao, Zhiwen Cao, Dongfang Liu, Xiangyu Zhang

Experimental results show that our method can generate stealthy, effective, and robust adversarial patches for different target objects and models and achieves more than 6 meters mean depth estimation error and 93% attack success rate (ASR) in object detection with a patch of 1/9 of the vehicle's rear area.

3D Object Detection Autonomous Driving +3

GL-RG: Global-Local Representation Granularity for Video Captioning

1 code implementation22 May 2022 Liqi Yan, Qifan Wang, Yiming Cui, Fuli Feng, Xiaojun Quan, Xiangyu Zhang, Dongfang Liu

Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description.

Caption Generation Descriptive +1

Deep Partial Multiplex Network Embedding

no code implementations5 Mar 2022 Qifan Wang, Yi Fang, Anirudh Ravula, Ruining He, Bin Shen, Jingang Wang, Xiaojun Quan, Dongfang Liu

Network embedding is an effective technique to learn the low-dimensional representations of nodes in networks.

Link Prediction Network Embedding +1

WebFormer: The Web-page Transformer for Structure Information Extraction

no code implementations1 Feb 2022 Qifan Wang, Yi Fang, Anirudh Ravula, Fuli Feng, Xiaojun Quan, Dongfang Liu

Structure information extraction refers to the task of extracting structured text fields from web pages, such as extracting a product offer from a shopping page including product title, description, brand and price.

Deep Attention document understanding +1

DG-Labeler and DGL-MOTS Dataset: Boost the Autonomous Driving Perception

no code implementations15 Oct 2021 Yiming Cui, Zhiwen Cao, Yixin Xie, Xingyu Jiang, Feng Tao, Yingjie Chen, Lin Li, Dongfang Liu

The existing MOTS studies face two critical challenges: 1) the published datasets inadequately capture the real-world complexity for network training to address various driving settings; 2) the working pipeline annotation tool is under-studied in the literature to improve the quality of MOTS learning examples.

Autonomous Driving Diversity +2

TF-Blender: Temporal Feature Blender for Video Object Detection

1 code implementation ICCV 2021 Yiming Cui, Liqi Yan, Zhiwen Cao, Dongfang Liu

One of the popular solutions is to exploit the temporal information and enhance per-frame representation through aggregating features from neighboring frames.

Object object-detection +1

Hierarchical Attention Fusion for Geo-Localization

1 code implementation18 Feb 2021 Liqi Yan, Yiming Cui, Yingjie Chen, Dongfang Liu

We extract the hierarchical feature maps from a convolutional neural network (CNN) and organically fuse the extracted features for image representations.

geo-localization Image Retrieval +1

Semantic Aware Data Augmentation for Cell Nuclei Microscopical Images With Artificial Neural Networks

no code implementations ICCV 2021 Alireza Naghizadeh, Hongye Xu, Mohab Mohamed, Dimitris N. Metaxas, Dongfang Liu

The importance of this subject is nested in the amount of training data that artificial neural networks need to accurately identify and segment objects in images and the infeasibility of acquiring a sufficient dataset within the biomedical field.

Data Augmentation object-detection +3

A Vector-based Representation to Enhance Head Pose Estimation

no code implementations14 Oct 2020 Zhiwen Cao, Zongcheng Chu, Dongfang Liu, Yingjie Chen

This paper proposes to use the three vectors in a rotation matrix as the representation in head pose estimation and develops a new neural network based on the characteristic of such representation.

Head Pose Estimation

Multimodal Aggregation Approach for Memory Vision-Voice Indoor Navigation with Meta-Learning

no code implementations1 Sep 2020 Liqi Yan, Dongfang Liu, Yaoxian Song, Changbin Yu

Memory is important for the agent to avoid repeating certain tasks unnecessarily and in order for it to adapt adequately to new scenes, therefore, we make use of meta-learning.

Meta-Learning Visual Navigation

Cannot find the paper you are looking for? You can Submit a new open access paper.