Search Results for author: Dongmei Jiang

Found 20 papers, 8 papers with code

Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy

no code implementations27 Feb 2025 Zaijing Li, Yuquan Xie, Rui Shao, Gongwei Chen, Dongmei Jiang, Liqiang Nie

GOAP contains (1) an Action-guided Behavior Encoder that models causal relationships between observations and actions at each timestep, then dynamically interacts with the historical observation-action sequence, consolidating it into fixed-length behavior tokens, and (2) an MLLM that aligns behavior tokens with open-ended language instructions to predict actions auto-regressively.

Large Language Model Minecraft +1

PolaFormer: Polarity-aware Linear Attention for Vision Transformers

no code implementations25 Jan 2025 Weikang Meng, Yadan Luo, Xin Li, Dongmei Jiang, Zheng Zhang

Linear attention has emerged as a promising alternative to softmax-based attention, leveraging kernelized feature maps to reduce complexity from quadratic to linear in sequence length.

CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation

1 code implementation20 Jan 2025 Zheng Chong, Wenqing Zhang, Shiyue Zhang, Jun Zheng, Xiao Dong, Haoxiang Li, Yiling Wu, Dongmei Jiang, Xiaodan Liang

Comprehensive experiments demonstrate that CatV2TON outperforms existing methods in both image and video try-on tasks, offering a versatile and reliable solution for realistic virtual try-ons across diverse scenarios.

Video Generation Virtual Try-on

Transferable Adversarial Face Attack with Text Controlled Attribute

2 code implementations16 Dec 2024 Wenyun Li, Zheng Zhang, Xiangyuan Lan, Dongmei Jiang

Extensive experiments on two high-resolution face recognition datasets validate that our TCA$^2$ method can generate natural text-guided adversarial impersonation faces with high transferability.

Attribute Face Recognition

AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment

no code implementations1 Dec 2024 Yan Li, Yifei Xing, Xiangyuan Lan, Xin Li, Haifeng Chen, Dongmei Jiang

Extensive experiments on complete and incomplete multimodal fusion tasks demonstrate the effectiveness and efficiency of the proposed method.

cross-modal alignment Mamba

CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs

no code implementations19 Nov 2024 Zhehan Kan, Ce Zhang, Zihan Liao, Yapeng Tian, Wenming Yang, Junyuan Xiao, Xu Li, Dongmei Jiang, YaoWei Wang, Qingmin Liao

Large Vision-Language Model (LVLM) systems have demonstrated impressive vision-language reasoning capabilities but suffer from pervasive and severe hallucination issues, posing significant risks in critical domains such as healthcare and autonomous systems.

Hallucination Language Modeling +3

EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment

no code implementations8 Oct 2024 Yifei Xing, Xiangyuan Lan, Ruiping Wang, Dongmei Jiang, Wenjun Huang, Qingfang Zheng, YaoWei Wang

In this work, we propose Empowering Multi-modal Mamba with Structural and Hierarchical Alignment (EMMA), which enables the MLLM to extract fine-grained visual information.

cross-modal alignment Hallucination +1

ExpLLM: Towards Chain of Thought for Facial Expression Recognition

no code implementations4 Sep 2024 Xing Lan, Jian Xue, Ji Qi, Dongmei Jiang, Ke Lu, Tat-Seng Chua

Specifically, we have designed the CoT mechanism from three key perspectives: key observations, overall emotional interpretation, and conclusion.

Facial Expression Recognition Facial Expression Recognition (FER)

Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks

1 code implementation7 Aug 2024 Zaijing Li, Yuquan Xie, Rui Shao, Gongwei Chen, Dongmei Jiang, Liqiang Nie

On top of the Hybrid Multimodal Memory module, a multimodal agent, Optimus-1, is constructed with dedicated Knowledge-guided Planner and Experience-Driven Reflector, contributing to a better planning and reflection in the face of long-horizon tasks in Minecraft.

Attribute In-Context Learning +2

OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

1 code implementation10 Jul 2024 Hao Wang, Pengzhen Ren, Zequn Jie, Xiao Dong, Chengjian Feng, Yinlong Qian, Lin Ma, Dongmei Jiang, YaoWei Wang, Xiangyuan Lan, Xiaodan Liang

To address these challenges, we propose a novel unified open-vocabulary detection method called OV-DINO, which is pre-trained on diverse large-scale datasets with language-aware selective fusion in a unified framework.

Ranked #5 on Zero-Shot Object Detection on MSCOCO (AP metric, using extra training data)

Zero-Shot Object Detection

Prompt Customization for Continual Learning

1 code implementation28 Apr 2024 Yong Dai, Xiaopeng Hong, Yabin Wang, Zhiheng Ma, Dongmei Jiang, YaoWei Wang

In contrast to conventional methods that employ hard prompt selection, PGM assigns different coefficients to prompts from a fixed-sized pool of prompts and generates tailored prompts.

Continual Learning Incremental Learning

Deep Homography Estimation for Visual Place Recognition

1 code implementation25 Feb 2024 Feng Lu, Shuting Dong, Lijun Zhang, Bingxi Liu, Xiangyuan Lan, Dongmei Jiang, Chun Yuan

Moreover, we design a re-projection error of inliers loss to train the DHE network without additional homography labels, which can also be jointly trained with the backbone network to help it extract the features that are more suitable for local matching.

Homography Estimation Re-Ranking +1

Enhancing Emotional Generation Capability of Large Language Models via Emotional Chain-of-Thought

no code implementations12 Jan 2024 Zaijing Li, Gongwei Chen, Rui Shao, Yuquan Xie, Dongmei Jiang, Liqiang Nie

In this paper, we propose the Emotional Chain-of-Thought (ECoT), a plug-and-play prompting method that enhances the performance of LLMs on various emotional generation tasks by aligning with human emotional intelligence guidelines.

Emotional Intelligence Emotion Recognition +1

Benign Shortcut for Debiasing: Fair Visual Recognition via Intervention with Shortcut Features

no code implementations13 Aug 2023 Yi Zhang, Jitao Sang, Junyang Wang, Dongmei Jiang, YaoWei Wang

To this end, we propose \emph{Shortcut Debiasing}, to first transfer the target task's learning of bias attributes from bias features to shortcut features, and then employ causal intervention to eliminate shortcut features during inference.

Fairness

Strip-MLP: Efficient Token Interaction for Vision MLP

1 code implementation ICCV 2023 Guiping Cao, Shengda Luo, Wenjian Huang, Xiangyuan Lan, Dongmei Jiang, YaoWei Wang, JianGuo Zhang

Finally, based on the Strip MLP layer, we propose a novel \textbf{L}ocal \textbf{S}trip \textbf{M}ixing \textbf{M}odule (LSMM) to boost the token interaction power in the local region.

Positional-Spectral-Temporal Attention in 3D Convolutional Neural Networks for EEG Emotion Recognition

no code implementations13 Oct 2021 Jiyao Liu, Yanxi Zhao, Hao Wu, Dongmei Jiang

The proposed module, denoted by PST-Attention, consists of Positional, Spectral and Temporal Attention modules to explore more discriminative EEG features.

EEG EEG Emotion Recognition

Efficient Spatialtemporal Context Modeling for Action Recognition

no code implementations20 Mar 2021 Congqi Cao, Yue Lu, Yifan Zhang, Dongmei Jiang, Yanning Zhang

Inspired from 2D criss-cross attention used in segmentation task, we propose a recurrent 3D criss-cross attention (RCCA-3D) module to model the dense long-range spatiotemporal contextual information in video for action recognition.

Action Recognition Relation

Efficient Convolutional Auto-Encoding via Random Convexification and Frequency-Domain Minimization

no code implementations28 Nov 2016 Meshia Cédric Oveneke, Mitchel Aliosha-Perez, Yong Zhao, Dongmei Jiang, Hichem Sahli

The omnipresence of deep learning architectures such as deep convolutional neural networks (CNN)s is fueled by the synergistic combination of ever-increasing labeled datasets and specialized hardware.

Cannot find the paper you are looking for? You can Submit a new open access paper.