Search Results for author: Dongfang Liu

Found 29 papers, 14 papers with code

Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning?

no code implementations23 Jan 2024 Cheng Han, Qifan Wang, Yiming Cui, Wenguan Wang, Lifu Huang, Siyuan Qi, Dongfang Liu

As the scale of vision models continues to grow, the emergence of Visual Prompt Tuning (VPT) as a parameter-efficient transfer learning technique has gained attention due to its superior performance compared to traditional full-finetuning.

Transfer Learning Visual Prompt Tuning

Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning

1 code implementation1 Dec 2023 Shaohua Dong, Yunhe Feng, Qing Yang, Yan Huang, Dongfang Liu, Heng Fan

Existing approaches often fully fine-tune a dual-branch encoder-decoder framework with a complicated feature fusion strategy for achieving multimodal semantic segmentation, which is training-costly due to the massive parameter updates in feature extraction and fusion.

Ranked #2 on Semantic Segmentation on SUN-RGBD (using extra training data)

object-detection Object Detection +6

ClusterFormer: Clustering As A Universal Visual Learner

1 code implementation22 Sep 2023 James C. Liang, Yiming Cui, Qifan Wang, Tong Geng, Wenguan Wang, Dongfang Liu

This paper presents CLUSTERFORMER, a universal vision model that is based on the CLUSTERing paradigm with TransFORMER.

Clustering Image Classification +7

E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning

1 code implementation ICCV 2023 Cheng Han, Qifan Wang, Yiming Cui, Zhiwen Cao, Wenguan Wang, Siyuan Qi, Dongfang Liu

Specifically, we introduce a set of learnable key-value prompts and visual prompts into self-attention and input layers, respectively, to improve the effectiveness of model fine-tuning.

Visual Prompt Tuning

CLUSTSEG: Clustering for Universal Segmentation

1 code implementation3 May 2023 James Liang, Tianfei Zhou, Dongfang Liu, Wenguan Wang

We present CLUSTSEG, a general, transformer-based framework that tackles different image segmentation tasks (i. e., superpixel, semantic, instance, and panoptic) through a unified neural clustering scheme.

Instance Segmentation Panoptic Segmentation +3

Fusion is Not Enough: Single Modal Attacks on Fusion Models for 3D Object Detection

no code implementations28 Apr 2023 Zhiyuan Cheng, Hongjun Choi, James Liang, Shiwei Feng, Guanhong Tao, Dongfang Liu, Michael Zuzak, Xiangyu Zhang

We argue that the weakest link of fusion models depends on their most vulnerable modality, and propose an attack framework that targets advanced camera-LiDAR fusion-based 3D object detection models through camera-only adversarial attacks.

3D Object Detection Autonomous Driving +2

TransFlow: Transformer as Flow Learner

no code implementations CVPR 2023 Yawen Lu, Qifan Wang, Siqi Ma, Tong Geng, Yingjie Victor Chen, Huaijin Chen, Dongfang Liu

Optical flow is an indispensable building block for various important computer vision tasks, including motion estimation, object tracking, and disparity measurement.

Motion Estimation object-detection +4

Exploiting Logic Locking for a Neural Trojan Attack on Machine Learning Accelerators

no code implementations12 Apr 2023 Hongye Xu, Dongfang Liu, Cory Merkel, Michael Zuzak

If an incorrect secret key is used, a set of deterministic errors is produced in locked modules, restricting unauthorized use.

Learning Equivariant Segmentation with Instance-Unique Querying

1 code implementation3 Oct 2022 Wenguan Wang, James Liang, Dongfang Liu

Prevalent state-of-the-art instance segmentation methods fall into a query-based scheme, in which instance masks are derived by querying the image feature using a set of instance-aware embeddings.

Instance Segmentation Semantic Segmentation

Visual Recognition with Deep Nearest Centroids

1 code implementation15 Sep 2022 Wenguan Wang, Cheng Han, Tianfei Zhou, Dongfang Liu

We devise deep nearest centroids (DNC), a conceptually elegant yet surprisingly effective network for large-scale visual recognition, by revisiting Nearest Centroids, one of the most classic and simple classifiers.

Decision Making Image Classification +1

Towards Unbiased Label Distribution Learning for Facial Pose Estimation Using Anisotropic Spherical Gaussian

no code implementations19 Aug 2022 Zhiwen Cao, Dongfang Liu, Qifan Wang, Yingjie Chen

In this paper, we propose an Anisotropic Spherical Gaussian (ASG)-based LDL approach for facial pose estimation.

Pose Estimation

Physical Attack on Monocular Depth Estimation with Optimal Adversarial Patches

1 code implementation11 Jul 2022 Zhiyuan Cheng, James Liang, Hongjun Choi, Guanhong Tao, Zhiwen Cao, Dongfang Liu, Xiangyu Zhang

Experimental results show that our method can generate stealthy, effective, and robust adversarial patches for different target objects and models and achieves more than 6 meters mean depth estimation error and 93% attack success rate (ASR) in object detection with a patch of 1/9 of the vehicle's rear area.

3D Object Detection Autonomous Driving +3

GL-RG: Global-Local Representation Granularity for Video Captioning

1 code implementation22 May 2022 Liqi Yan, Qifan Wang, Yiming Cui, Fuli Feng, Xiaojun Quan, Xiangyu Zhang, Dongfang Liu

Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description.

Caption Generation Descriptive +1

Deep Partial Multiplex Network Embedding

no code implementations5 Mar 2022 Qifan Wang, Yi Fang, Anirudh Ravula, Ruining He, Bin Shen, Jingang Wang, Xiaojun Quan, Dongfang Liu

Network embedding is an effective technique to learn the low-dimensional representations of nodes in networks.

Link Prediction Network Embedding +1

WebFormer: The Web-page Transformer for Structure Information Extraction

no code implementations1 Feb 2022 Qifan Wang, Yi Fang, Anirudh Ravula, Fuli Feng, Xiaojun Quan, Dongfang Liu

Structure information extraction refers to the task of extracting structured text fields from web pages, such as extracting a product offer from a shopping page including product title, description, brand and price.

Deep Attention document understanding +1

DG-Labeler and DGL-MOTS Dataset: Boost the Autonomous Driving Perception

no code implementations15 Oct 2021 Yiming Cui, Zhiwen Cao, Yixin Xie, Xingyu Jiang, Feng Tao, Yingjie Chen, Lin Li, Dongfang Liu

The existing MOTS studies face two critical challenges: 1) the published datasets inadequately capture the real-world complexity for network training to address various driving settings; 2) the working pipeline annotation tool is under-studied in the literature to improve the quality of MOTS learning examples.

Autonomous Driving Multi-Object Tracking +1

TF-Blender: Temporal Feature Blender for Video Object Detection

1 code implementation ICCV 2021 Yiming Cui, Liqi Yan, Zhiwen Cao, Dongfang Liu

One of the popular solutions is to exploit the temporal information and enhance per-frame representation through aggregating features from neighboring frames.

Object object-detection +1

Hierarchical Attention Fusion for Geo-Localization

1 code implementation18 Feb 2021 Liqi Yan, Yiming Cui, Yingjie Chen, Dongfang Liu

We extract the hierarchical feature maps from a convolutional neural network (CNN) and organically fuse the extracted features for image representations.

Image Retrieval Retrieval

Semantic Aware Data Augmentation for Cell Nuclei Microscopical Images With Artificial Neural Networks

no code implementations ICCV 2021 Alireza Naghizadeh, Hongye Xu, Mohab Mohamed, Dimitris N. Metaxas, Dongfang Liu

The importance of this subject is nested in the amount of training data that artificial neural networks need to accurately identify and segment objects in images and the infeasibility of acquiring a sufficient dataset within the biomedical field.

Data Augmentation object-detection +3

A Vector-based Representation to Enhance Head Pose Estimation

no code implementations14 Oct 2020 Zhiwen Cao, Zongcheng Chu, Dongfang Liu, Yingjie Chen

This paper proposes to use the three vectors in a rotation matrix as the representation in head pose estimation and develops a new neural network based on the characteristic of such representation.

Head Pose Estimation

Multimodal Aggregation Approach for Memory Vision-Voice Indoor Navigation with Meta-Learning

no code implementations1 Sep 2020 Liqi Yan, Dongfang Liu, Yaoxian Song, Changbin Yu

Memory is important for the agent to avoid repeating certain tasks unnecessarily and in order for it to adapt adequately to new scenes, therefore, we make use of meta-learning.

Meta-Learning Visual Navigation

Cannot find the paper you are looking for? You can Submit a new open access paper.