1 code implementation • 2 Mar 2025 • Yiyang Liu, James Chenhao Liang, Ruixiang Tang, Yugyung Lee, Majid Rabbani, Sohail Dianat, Raghuveer Rao, Lifu Huang, Dongfang Liu, Qifan Wang, Cheng Han
Multimodal instruction tuning has proven to be an effective strategy for achieving zero-shot generalization by fine-tuning pre-trained Large Multimodal Models (LMMs) with instruction-following data.
no code implementations • 7 Jan 2025 • Dong Hyun Jeon, Wenbo Sun, Houbing Herbert Song, Dongfang Liu, Velasquez Alvaro, Yixin Chloe Xie, Shuteng Niu
This study introduces the Knowledge Graph Attention Network with Information Fusion (KGIF), a specialized framework designed to merge entity and relation embeddings explicitly through a tailored self-attention mechanism.
1 code implementation • 18 Nov 2024 • Taowen Wang, Cheng Han, James Chenhao Liang, Wenhao Yang, Dongfang Liu, Luna Xinyu Zhang, Qifan Wang, Jiebo Luo, Ruixiang Tang
In particular, we introduce two untargeted attack objectives that leverage spatial foundations to destabilize robotic actions, and a targeted attack objective that manipulates the robotic trajectory.
no code implementations • 9 Nov 2024 • Chong Zhang, Mingyu Jin, Dong Shu, Taowen Wang, Dongfang Liu, Xiaobo Jin
To solve this problem, we propose our target-driven black-box attack method to maximize the KL divergence between the conditional probabilities of the clean text and the attack text to redefine the attack's goal.
1 code implementation • 2 Nov 2024 • Runjia Zeng, Cheng Han, Qifan Wang, Chunshu Wu, Tong Geng, Lifu Huang, Ying Nian Wu, Dongfang Liu
To address this challenge, we draw inspiration from human visual cognition, and propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models.
2 code implementations • 24 Sep 2024 • Taowen Wang, Yiyang Liu, James Chenhao Liang, Junhan Zhao, Yiming Cui, Yuning Mao, Shaoliang Nie, Jiahao Liu, Fuli Feng, Zenglin Xu, Cheng Han, Lifu Huang, Qifan Wang, Dongfang Liu
Instruction tuning has emerged as an effective strategy for achieving zero-shot generalization by finetuning pretrained models on diverse multimodal tasks.
1 code implementation • 16 Aug 2024 • Guangyan Sun, Mingyu Jin, Zhenting Wang, Cheng-Long Wang, Siqi Ma, Qifan Wang, Tong Geng, Ying Nian Wu, Yongfeng Zhang, Dongfang Liu
With this novel design, we advocate a flexible system, hierarchical reasoning capabilities, and a transparent decision-making pipeline, all of which contribute to its ability to emulate human-like cognitive processes in visual intelligence.
Ranked #198 on
Visual Question Answering
on MM-Vet
no code implementations • 10 Aug 2024 • Liqi Yan, Qifan Wang, Junhan Zhao, Qiang Guan, Zheng Tang, Jianhui Zhang, Dongfang Liu
First-Person-View (FPV) holds immense potential for revolutionizing the trajectory of Unmanned Aerial Vehicles (UAVs), offering an exhilarating avenue for navigating complex building structures.
no code implementations • 3 Aug 2024 • Chuan Liu, Chunshu Wu, Shihui Cao, Mingkai Chen, James Chenhao Liang, Ang Li, Michael Huang, Chuang Ren, Dongfang Liu, Ying Nian Wu, Tong Geng
The rapid development of AI highlights the pressing need for sustainable energy, a critical global challenge for decades.
no code implementations • 15 Jul 2024 • Mingkai Chen, Taowen Wang, Shihui Cao, James Chenhao Liang, Chuan Liu, Chunshu Wu, Qifan Wang, Ying Nian Wu, Michael Huang, Chuang Ren, Ang Li, Tong Geng, Dongfang Liu
Controlled fusion energy is deemed pivotal for the advancement of human civilization.
no code implementations • 5 Jul 2024 • Cheng Han, Qifan Wang, Sohail A. Dianat, Majid Rabbani, Raghuveer M. Rao, Yi Fang, Qiang Guan, Lifu Huang, Dongfang Liu
Transformer-based architectures have become the de-facto standard models for diverse vision tasks owing to their superior performance.
no code implementations • 10 Jun 2024 • Li Yang, Qifan Wang, Jianfeng Chi, Jiahao Liu, Jingang Wang, Fuli Feng, Zenglin Xu, Yi Fang, Lifu Huang, Dongfang Liu
Specifically, we employ a heavy encoder to separately encode the product context and attribute.
1 code implementation • 9 Jun 2024 • Zhiyuan Cheng, Cheng Han, James Liang, Qifan Wang, Xiangyu Zhang, Dongfang Liu
Our experiments with two representative MDE networks demonstrate improved robustness against various adversarial attacks, with minimal impact on benign performance.
no code implementations • CVPR 2024 • Yawen Lu, Dongfang Liu, Qifan Wang, Cheng Han, Yiming Cui, Zhiwen Cao, Xueling Zhang, Yingjie Victor Chen, Heng Fan
We capitalize on a dual mechanism involving the feature denoiser and the prototypical learner to decipher the intricacies of motion.
no code implementations • 3 Jun 2024 • Cheng Han, Yawen Lu, Guohao Sun, James C. Liang, Zhiwen Cao, Qifan Wang, Qiang Guan, Sohail A. Dianat, Raghuveer M. Rao, Tong Geng, Zhiqiang Tao, Dongfang Liu
In this work, we introduce the Prototypical Transformer (ProtoFormer), a general and unified framework that approaches various motion tasks from a prototype perspective.
no code implementations • 29 May 2024 • Yiming Cui, Cheng Han, Dongfang Liu
Spatial global-local aggregation fuses the local information from the neighboring frames and global semantics from the current frame to eliminate the feature degradation; 3).
1 code implementation • 1 Apr 2024 • Zhiyuan Cheng, Zhaoyi Liu, Tengda Guo, Shiwei Feng, Dongfang Liu, Mingjie Tang, Xiangyu Zhang
Our attack prototype, named BadPart, is evaluated on both MDE and OFE tasks, utilizing a total of 7 models.
no code implementations • CVPR 2024 • Jiamian Wang, Guohao Sun, Pichao Wang, Dongfang Liu, Sohail Dianat, Majid Rabbani, Raghuveer Rao, Zhiqiang Tao
Correspondingly, a single text embedding may be less expressive to capture the video embedding and empower the retrieval.
1 code implementation • 23 Jan 2024 • Cheng Han, Qifan Wang, Yiming Cui, Wenguan Wang, Lifu Huang, Siyuan Qi, Dongfang Liu
As the scale of vision models continues to grow, the emergence of Visual Prompt Tuning (VPT) as a parameter-efficient transfer learning technique has gained attention due to its superior performance compared to traditional full-finetuning.
no code implementations • 18 Jan 2024 • Cheng Han, James C. Liang, Qifan Wang, Majid Rabbani, Sohail Dianat, Raghuveer Rao, Ying Nian Wu, Dongfang Liu
We introduce the novel Diffusion Visual Programmer (DVP), a neuro-symbolic image translation framework.
1 code implementation • 1 Dec 2023 • Shaohua Dong, Yunhe Feng, Qing Yang, Yan Huang, Dongfang Liu, Heng Fan
Existing approaches often fully fine-tune a dual-branch encoder-decoder framework with a complicated feature fusion strategy for achieving multimodal semantic segmentation, which is training-costly due to the massive parameter updates in feature extraction and fusion.
Ranked #6 on
Semantic Segmentation
on NYU Depth v2
no code implementations • 2 Nov 2023 • Yiming Cui, Cheng Han, Dongfang Liu
The advancement of computer vision has pushed visual analysis tasks from still images to the video domain.
1 code implementation • 22 Sep 2023 • James C. Liang, Yiming Cui, Qifan Wang, Tong Geng, Wenguan Wang, Dongfang Liu
This paper presents CLUSTERFORMER, a universal vision model that is based on the CLUSTERing paradigm with TransFORMER.
1 code implementation • ICCV 2023 • Cheng Han, Qifan Wang, Yiming Cui, Zhiwen Cao, Wenguan Wang, Siyuan Qi, Dongfang Liu
Specifically, we introduce a set of learnable key-value prompts and visual prompts into self-attention and input layers, respectively, to improve the effectiveness of model fine-tuning.
no code implementations • Findings of the Association for Computational Linguistics 2023 • Li Yang, Qifan Wang, Jingang Wang, Xiaojun Quan, Fuli Feng, Yu Chen, Madian Khabsa, Sinong Wang, Zenglin Xu, Dongfang Liu
In this work, we propose a novel prompt tuning approach with Mixed Prompts for few-shot Attribute Value Extraction, namely MixPAVE.
1 code implementation • 3 May 2023 • James Liang, Tianfei Zhou, Dongfang Liu, Wenguan Wang
We present CLUSTSEG, a general, transformer-based framework that tackles different image segmentation tasks (i. e., superpixel, semantic, instance, and panoptic) through a unified neural clustering scheme.
1 code implementation • 28 Apr 2023 • Zhiyuan Cheng, Hongjun Choi, James Liang, Shiwei Feng, Guanhong Tao, Dongfang Liu, Michael Zuzak, Xiangyu Zhang
We argue that the weakest link of fusion models depends on their most vulnerable modality, and propose an attack framework that targets advanced camera-LiDAR fusion-based 3D object detection models through camera-only adversarial attacks.
no code implementations • CVPR 2023 • Yawen Lu, Qifan Wang, Siqi Ma, Tong Geng, Yingjie Victor Chen, Huaijin Chen, Dongfang Liu
Optical flow is an indispensable building block for various important computer vision tasks, including motion estimation, object tracking, and disparity measurement.
no code implementations • 12 Apr 2023 • Hongye Xu, Dongfang Liu, Cory Merkel, Michael Zuzak
If an incorrect secret key is used, a set of deterministic errors is produced in locked modules, restricting unauthorized use.
1 code implementation • 31 Jan 2023 • Zhiyuan Cheng, James Liang, Guanhong Tao, Dongfang Liu, Xiangyu Zhang
We improve adversarial robustness against physical-world attacks using L0-norm-bounded perturbation in training.
1 code implementation • 3 Oct 2022 • Wenguan Wang, James Liang, Dongfang Liu
Prevalent state-of-the-art instance segmentation methods fall into a query-based scheme, in which instance masks are derived by querying the image feature using a set of instance-aware embeddings.
1 code implementation • 15 Sep 2022 • Wenguan Wang, Cheng Han, Tianfei Zhou, Dongfang Liu
We devise deep nearest centroids (DNC), a conceptually elegant yet surprisingly effective network for large-scale visual recognition, by revisiting Nearest Centroids, one of the most classic and simple classifiers.
no code implementations • 19 Aug 2022 • Zhiwen Cao, Dongfang Liu, Qifan Wang, Yingjie Chen
In this paper, we propose an Anisotropic Spherical Gaussian (ASG)-based LDL approach for facial pose estimation.
2 code implementations • 11 Jul 2022 • Zhiyuan Cheng, James Liang, Hongjun Choi, Guanhong Tao, Zhiwen Cao, Dongfang Liu, Xiangyu Zhang
Experimental results show that our method can generate stealthy, effective, and robust adversarial patches for different target objects and models and achieves more than 6 meters mean depth estimation error and 93% attack success rate (ASR) in object detection with a patch of 1/9 of the vehicle's rear area.
1 code implementation • 22 May 2022 • Liqi Yan, Qifan Wang, Yiming Cui, Fuli Feng, Xiaojun Quan, Xiangyu Zhang, Dongfang Liu
Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description.
no code implementations • 5 Mar 2022 • Qifan Wang, Yi Fang, Anirudh Ravula, Ruining He, Bin Shen, Jingang Wang, Xiaojun Quan, Dongfang Liu
Network embedding is an effective technique to learn the low-dimensional representations of nodes in networks.
no code implementations • 1 Feb 2022 • Qifan Wang, Yi Fang, Anirudh Ravula, Fuli Feng, Xiaojun Quan, Dongfang Liu
Structure information extraction refers to the task of extracting structured text fields from web pages, such as extracting a product offer from a shopping page including product title, description, brand and price.
no code implementations • 15 Oct 2021 • Yiming Cui, Zhiwen Cao, Yixin Xie, Xingyu Jiang, Feng Tao, Yingjie Chen, Lin Li, Dongfang Liu
The existing MOTS studies face two critical challenges: 1) the published datasets inadequately capture the real-world complexity for network training to address various driving settings; 2) the working pipeline annotation tool is under-studied in the literature to improve the quality of MOTS learning examples.
1 code implementation • ICCV 2021 • Yiming Cui, Liqi Yan, Zhiwen Cao, Dongfang Liu
One of the popular solutions is to exploit the temporal information and enhance per-frame representation through aggregating features from neighboring frames.
1 code implementation • CVPR 2021 • Dongfang Liu, Yiming Cui, Wenbo Tan, Yingjie Chen
Video instance segmentation (VIS) is a new and critical task in computer vision.
1 code implementation • 18 Feb 2021 • Liqi Yan, Yiming Cui, Yingjie Chen, Dongfang Liu
We extract the hierarchical feature maps from a convolutional neural network (CNN) and organically fuse the extracted features for image representations.
no code implementations • ICCV 2021 • Alireza Naghizadeh, Hongye Xu, Mohab Mohamed, Dimitris N. Metaxas, Dongfang Liu
The importance of this subject is nested in the amount of training data that artificial neural networks need to accurately identify and segment objects in images and the infeasibility of acquiring a sufficient dataset within the biomedical field.
1 code implementation • 4 Dec 2020 • Dongfang Liu, Yiming Cui, Liqi Yan, Christos Mousas, Baijian Yang, Yingjie Chen
In this work, we introduce a Denser Feature Network (DenserNet) for visual localization.
no code implementations • 14 Oct 2020 • Zhiwen Cao, Zongcheng Chu, Dongfang Liu, Yingjie Chen
This paper proposes to use the three vectors in a rotation matrix as the representation in head pose estimation and develops a new neural network based on the characteristic of such representation.
no code implementations • 1 Sep 2020 • Liqi Yan, Dongfang Liu, Yaoxian Song, Changbin Yu
Memory is important for the agent to avoid repeating certain tasks unnecessarily and in order for it to adapt adequately to new scenes, therefore, we make use of meta-learning.
Ranked #1 on
Visual Navigation
on AI2-THOR
no code implementations • 13 Aug 2020 • Dongfang Liu, Yiming Cui, Xiaolei Guo, Wei Ding, Baijian Yang, Yingjie Chen
It is a common practice for vehicles to use GPS to acquire location information.