Search Results for author: Yunhai Tong

Found 68 papers, 45 papers with code

Enhancing Self-Attention with Knowledge-Assisted Attention Maps

no code implementations NAACL 2022 Jiangang Bai, Yujing Wang, Hong Sun, Ruonan Wu, Tianmeng Yang, Pengfei Tang, Defu Cao, Mingliang Zhang1, Yunhai Tong, Yaming Yang, Jing Bai, Ruofei Zhang, Hao Sun, Wei Shen

Large-scale pre-trained language models have attracted extensive attentions in the research community and shown promising results on various tasks of natural language processing.

Multi-Task Learning Natural Language Understanding

Generative Classifier for Domain Generalization

no code implementations3 Apr 2025 Shaocong Long, Qianyu Zhou, Xiangtai Li, Chenhao Ying, Yunhai Tong, Lizhuang Ma, Yuan Luo, DaCheng Tao

To address these issues, we explore the theoretical implications of relying on domain invariance, revealing the crucial role of domain-specific information in mitigating the target risk for DG.

Blocking Domain Generalization +1

Training-free Diffusion Acceleration with Bottleneck Sampling

no code implementations24 Mar 2025 Ye Tian, Xin Xia, Yuxi Ren, Shanchuan Lin, Xing Wang, Xuefeng Xiao, Yunhai Tong, Ling Yang, Bin Cui

Bottleneck Sampling follows a high-low-high denoising workflow: it performs high-resolution denoising in the initial and final stages while operating at lower resolutions in intermediate steps.

Denoising Image Generation +1

Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer

1 code implementation21 Mar 2025 Qingyu Shi, Jianzong Wu, Jinbin Bai, Jiangning Zhang, Lu Qi, Xiangtai Li, Yunhai Tong

In contrast, state-of-the-art video Diffusion Transformers (DiT) models use 3D full attention, which does not explicitly separate temporal and spatial information.

Benchmarking Video Generation

Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening

1 code implementation17 Feb 2025 Ye Tian, Ling Yang, Xinchen Zhang, Yunhai Tong, Mengdi Wang, Bin Cui

We propose Diffusion-Sharpening, a fine-tuning approach that enhances downstream alignment by optimizing sampling trajectories.

Denoising

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

1 code implementation8 Jan 2025 Yikang Zhou, Tao Zhang, Shilin Xu, Shihao Chen, Qianyu Zhou, Yunhai Tong, Shunping Ji, Jiangning Zhang, Xiangtai Li, Lu Qi

Recent advancements in multimodal models have shown a strong ability in visual perception, reasoning abilities, and vision-language understanding.

Contrastive Learning

Training-free Heterogeneous Graph Condensation via Data Selection

1 code implementation20 Dec 2024 Yuxuan Liang, Wentao Zhang, Xinyi Gao, Ling Yang, Chong Chen, Hongzhi Yin, Yunhai Tong, Bin Cui

The second is low efficiency, HGCond follows the existing GC methods designed for homogeneous graphs and leverages the sophisticated optimization paradigm, resulting in a time-consuming condensing procedure.

Graph Generation

Towards Scalable and Deep Graph Neural Networks via Noise Masking

no code implementations19 Dec 2024 Yuxuan Liang, Wentao Zhang, Zeang Sheng, Ling Yang, Quanqing Xu, Jiawei Jiang, Yunhai Tong, Bin Cu

Unlike the previous model-simplification works, we focus on continuous P and found that the noise existing inside each P is the cause of the over-smoothing issue, and use the efficient masking mechanism to eliminate them.

Graph Mining

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

no code implementations10 Dec 2024 Jianzong Wu, Chao Tang, Jingbo Wang, Yanhong Zeng, Xiangtai Li, Yunhai Tong

Story visualization, the task of creating visual narratives from textual descriptions, has seen progress with text-to-image generation models.

Language Modelling Large Language Model +3

RelationBooth: Towards Relation-Aware Customized Object Generation

no code implementations30 Oct 2024 Qingyu Shi, Lu Qi, Jianzong Wu, Jinbin Bai, Jingbo Wang, Yunhai Tong, Xiangtai Li, Ming-Husan Yang

Instead, this work addresses that gap by focusing on relation-aware customized image generation, which aims to preserve the identities from image prompts while maintaining the predicate relations described in text prompts.

Image Generation Object +1

MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning

1 code implementation12 Oct 2024 Yaming Yang, Dilxat Muhtar, Yelong Shen, Yuefeng Zhan, Jianfeng Liu, Yujing Wang, Hao Sun, Denvy Deng, Feng Sun, Qi Zhang, Weizhu Chen, Yunhai Tong

Parameter-efficient fine-tuning (PEFT) has been widely employed for domain adaptation, with LoRA being one of the most prominent methods due to its simplicity and effectiveness.

Domain Adaptation Multi-Task Learning +2

Direct Preference Optimization for LLM-Enhanced Recommendation Systems

no code implementations8 Oct 2024 Chao Sun, Yaobo Liang, Yaming Yang, Shilin Xu, Tianmeng Yang, Yunhai Tong

Large Language Models (LLMs) have exhibited remarkable performance across a wide range of domains, motivating research into their potential for recommendation systems.

In-Context Learning Instruction Following +4

You Can't Ignore Either: Unifying Structure and Feature Denoising for Robust Graph Learning

1 code implementation1 Aug 2024 Tianmeng Yang, Jiahao Meng, Min Zhou, Yaming Yang, Yujing Wang, Xiangtai Li, Yunhai Tong

However, the noises and attacks may come from both structures and features in graphs, making the graph denoising a dilemma and challenging problem.

Denoising Graph Learning

LLAVADI: What Matters For Multimodal Large Language Models Distillation

no code implementations28 Jul 2024 Shilin Xu, Xiangtai Li, Haobo Yuan, Lu Qi, Yunhai Tong, Ming-Hsuan Yang

The recent surge in Multimodal Large Language Models (MLLMs) has showcased their remarkable potential for achieving generalized intelligence by integrating visual understanding into Large Language Models. Nevertheless, the sheer model size of MLLMs leads to substantial memory and computational demands that hinder their widespread deployment.

Knowledge Distillation

MotionBooth: Motion-Aware Customized Text-to-Video Generation

no code implementations25 Jun 2024 Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen

In this work, we present MotionBooth, an innovative framework designed for animating customized subjects with precise control over both object and camera movements.

Text-to-Video Generation Video Generation

SEFraud: Graph-based Self-Explainable Fraud Detection via Interpretative Mask Learning

no code implementations17 Jun 2024 Kaidi Li, Tianmeng Yang, Min Zhou, Jiahao Meng, Shendi Wang, Yihui Wu, Boshuai Tan, Hu Song, Lujia Pan, Fan Yu, Zhenli Sheng, Yunhai Tong

While considerable efforts have been devoted to designing adequate fraud detectors, the interpretability of their results has often been overlooked.

Fraud Detection Triplet

SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow

1 code implementation30 May 2024 Chaoyang Wang, Xiangtai Li, Lu Qi, Henghui Ding, Yunhai Tong, Ming-Hsuan Yang

For image synthesis, we propose a finite perturbation approach to enhance the diversity of generated results without changing the semantic categories.

Diversity Image Generation +2

VG4D: Vision-Language Model Goes 4D Video Recognition

1 code implementation17 Apr 2024 Zhichao Deng, Xiangtai Li, Xia Li, Yunhai Tong, Shen Zhao, Mengyuan Liu

By transferring the knowledge of the VLM to the 4D encoder and combining the VLM, our VG4D achieves improved recognition performance.

Action Recognition Autonomous Driving +4

DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection

1 code implementation2 Oct 2023 Shilin Xu, Xiangtai Li, Size Wu, Wenwei Zhang, Yunhai Tong, Chen Change Loy

We refer to this approach as the self-training strategy, which enhances recall and accuracy for novel classes without requiring extra annotations, datasets, and re-training.

Novel Object Detection Object +6

Mitigating Semantic Confusion from Hostile Neighborhood for Graph Active Learning

1 code implementation17 Aug 2023 Tianmeng Yang, Min Zhou, Yujing Wang, Zhengjie Lin, Lujia Pan, Bin Cui, Yunhai Tong

Graph Active Learning (GAL), which aims to find the most informative nodes in graphs for annotation to maximize the Graph Neural Networks (GNNs) performance, has attracted many research efforts but remains non-trivial challenges.

Active Learning Diversity +1

PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation

1 code implementation3 Jan 2023 Xiangtai Li, Shilin Xu, Yibo Yang, Haobo Yuan, Guangliang Cheng, Yunhai Tong, Zhouchen Lin, Ming-Hsuan Yang, DaCheng Tao

Third, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross-attention scheme to boost part segmentation qualities further.

Panoptic Segmentation Segmentation

Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation

2 code implementations ICCV 2023 Jianzong Wu, Xiangtai Li, Henghui Ding, Xia Li, Guangliang Cheng, Yunhai Tong, Chen Change Loy

Experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS) demonstrate the superiority of the CGG.

Caption Generation Instance Segmentation +2

Convolution-enhanced Evolving Attention Networks

1 code implementation16 Dec 2022 Yujing Wang, Yaming Yang, Zhuo Li, Jiangang Bai, Mingliang Zhang, Xiangtai Li, Jing Yu, Ce Zhang, Gao Huang, Yunhai Tong

To the best of our knowledge, this is the first work that explicitly models the layer-wise evolution of attention maps.

Image Classification Machine Translation +3

TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers

3 code implementations13 Jan 2022 Qianyu Zhou, Xiangtai Li, Lu He, Yibo Yang, Guangliang Cheng, Yunhai Tong, Lizhuang Ma, DaCheng Tao

Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors.

Ranked #6 on Video Object Detection on ImageNet VID (using extra training data)

Object object-detection +2

Graph Pointer Neural Networks

no code implementations3 Oct 2021 Tianmeng Yang, Yujing Wang, Zhihan Yue, Yaming Yang, Yunhai Tong, Jing Bai

On the one hand, multi-hop-based approaches do not explicitly distinguish relevant nodes from a large number of multi-hop neighborhoods, leading to a severe over-smoothing problem.

Node Classification

Competence-based Curriculum Learning for Multilingual Machine Translation

no code implementations Findings (EMNLP) 2021 Mingliang Zhang, Fandong Meng, Yunhai Tong, Jie zhou

Therefore, we focus on balancing the learning competencies of different languages and propose Competence-based Curriculum Learning for Multilingual Machine Translation, named CCL-M.

Machine Translation Translation

Global Aggregation then Local Distribution for Scene Parsing

1 code implementation28 Jul 2021 Xiangtai Li, Li Zhang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong, Xiatian Zhu, Tao Xiang

Modelling long-range contextual relationships is critical for pixel-wise prediction tasks such as semantic segmentation.

Scene Parsing Segmentation +1

Improving Video Instance Segmentation via Temporal Pyramid Routing

1 code implementation28 Jul 2021 Xiangtai Li, Hao He, Yibo Yang, Henghui Ding, Kuiyuan Yang, Guangliang Cheng, Yunhai Tong, DaCheng Tao

To incorporate both temporal and scale information, we propose a Temporal Pyramid Routing (TPR) strategy to conditionally align and conduct pixel-level aggregation from a feature pyramid pair of two adjacent frames.

Instance Segmentation Panoptic Segmentation +2

Customizing Graph Neural Networks using Path Reweighting

2 code implementations21 Jun 2021 Jianpeng Chen, Yujing Wang, Ming Zeng, Zongyi Xiang, Bitan Hou, Yunhai Tong, Ole J. Mengshoel, Yazhou Ren

Specifically, the proposed CustomGNN can automatically learn the high-level semantics for specific downstream tasks to highlight semantically relevant paths as well to filter out task-irrelevant noises in a graph.

Data Augmentation Graph Attention +2

TS2Vec: Towards Universal Representation of Time Series

4 code implementations19 Jun 2021 Zhihan Yue, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yunhai Tong, Bixiong Xu

Furthermore, to obtain the representation of an arbitrary sub-sequence in the time series, we can apply a simple aggregation over the representations of corresponding timestamps.

Contrastive Learning Time Series +3

Fast and Accurate Scene Parsing via Bi-direction Alignment Networks

1 code implementation25 May 2021 Yanran Wu, Xiangtai Li, Chen Shi, Yunhai Tong, Yang Hua, Tao Song, Ruhui Ma, Haibing Guan

Motivated by this, we propose a novel network by aligning two-path information into each other through a learned flow field.

Scene Parsing Semantic Segmentation

BoundarySqueeze: Image Segmentation as Boundary Squeezing

1 code implementation25 May 2021 Hao He, Xiangtai Li, Yibo Yang, Guangliang Cheng, Yunhai Tong, Lubin Weng, Zhouchen Lin, Shiming Xiang

This module is used to squeeze the object boundary from both inner and outer directions, which contributes to precise mask representation.

Image Segmentation Instance Segmentation +2

Dynamic Dual Sampling Module for Fine-Grained Semantic Segmentation

no code implementations25 May 2021 Chen Shi, Xiangtai Li, Yanran Wu, Yunhai Tong, Yi Xu

Representation of semantic context and local details is the essential issue for building modern semantic segmentation models.

Segmentation Semantic Segmentation

End-to-End Video Object Detection with Spatial-Temporal Transformers

1 code implementation23 May 2021 Lu He, Qianyu Zhou, Xiangtai Li, Li Niu, Guangliang Cheng, Xiao Li, Wenxuan Liu, Yunhai Tong, Lizhuang Ma, Liqing Zhang

Recently, DETR and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors.

Object object-detection +2

PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation

1 code implementation CVPR 2021 Xiangtai Li, Hao He, Xia Li, Duo Li, Guangliang Cheng, Jianping Shi, Lubin Weng, Yunhai Tong, Zhouchen Lin

Experimental results on three different aerial segmentation datasets suggest that the proposed method is more effective and efficient than state-of-the-art general semantic segmentation methods.

Image Segmentation Segmentation +1

Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees

1 code implementation EACL 2021 Jiangang Bai, Yujing Wang, Yiren Chen, Yaming Yang, Jing Bai, Jing Yu, Yunhai Tong

Pre-trained language models like BERT achieve superior performances in various NLP tasks without explicit consideration of syntactic information.

Natural Language Understanding

Evolving Attention with Residual Convolutions

2 code implementations20 Feb 2021 Yujing Wang, Yaming Yang, Jiangang Bai, Mingliang Zhang, Jing Bai, Jing Yu, Ce Zhang, Gao Huang, Yunhai Tong

In this paper, we propose a novel and generic mechanism based on evolving attention to improve the performance of transformers.

Image Classification Machine Translation +2

Predictive Attention Transformer: Improving Transformer with Attention Map Prediction

no code implementations1 Jan 2021 Yujing Wang, Yaming Yang, Jiangang Bai, Mingliang Zhang, Jing Bai, Jing Yu, Ce Zhang, Yunhai Tong

Instead, we model their dependencies via a chain of prediction models that take previous attention maps as input to predict the attention maps of a new layer through convolutional neural networks.

de-en Machine Translation +1

Towards Efficient Scene Understanding via Squeeze Reasoning

1 code implementation6 Nov 2020 Xiangtai Li, Xia Li, Ansheng You, Li Zhang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong, Zhouchen Lin

Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector and perform reasoning within the single vector where the computation cost can be significantly reduced.

Instance Segmentation object-detection +4

Boundary Content Graph Neural Network for Temporal Action Proposal Generation

no code implementations ECCV 2020 Yueran Bai, Yingying Wang, Yunhai Tong, Yang Yang, Qiyue Liu, Junhui Liu

To address this issue, we propose a novel Boundary Content Graph Neural Network (BC-GNN) to model the insightful relations between the boundary and action content of temporal proposals by the graph neural networks.

Action Detection Action Understanding +2

Improving Semantic Segmentation via Decoupled Body and Edge Supervision

2 code implementations ECCV 2020 Xiangtai Li, Xia Li, Li Zhang, Guangliang Cheng, Jianping Shi, Zhouchen Lin, Shaohua Tan, Yunhai Tong

Our insight is that appealing performance of semantic segmentation requires \textit{explicitly} modeling the object \textit{body} and \textit{edge}, which correspond to the high and low frequency of the image.

Object Segmentation +1

LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression

no code implementations COLING 2020 Yihuan Mao, Yujing Wang, Chufan Wu, Chen Zhang, Yang Wang, Yaming Yang, Quanlu Zhang, Yunhai Tong, Jing Bai

BERT is a cutting-edge language representation model pre-trained by a large corpus, which achieves superior performances on various natural language understanding tasks.

Blocking Knowledge Distillation +2

Improving BERT with Self-Supervised Attention

1 code implementation8 Apr 2020 Yiren Chen, Xiaoyu Kou, Jiangang Bai, Yunhai Tong

One of the most popular paradigms of applying large pre-trained NLP models such as BERT is to fine-tune it on a smaller dataset.

Sentence

Global Aggregation then Local Distribution in Fully Convolutional Networks

2 code implementations16 Sep 2019 Xiangtai Li, Li Zhang, Ansheng You, Maoke Yang, Kuiyuan Yang, Yunhai Tong

GALD is end-to-end trainable and can be easily plugged into existing FCNs with various global aggregation modules for a wide range of vision tasks, and consistently improves the performance of state-of-the-art object detection and instance segmentation approaches.

Instance Segmentation object-detection +4

Dual Graph Convolutional Network for Semantic Segmentation

6 code implementations13 Sep 2019 Li Zhang, Xiangtai Li, Anurag Arnab, Kuiyuan Yang, Yunhai Tong, Philip H. S. Torr

Exploiting long-range contextual information is key for pixel-wise prediction tasks such as semantic segmentation.

Semantic Segmentation

Reprojection R-CNN: A Fast and Accurate Object Detector for 360° Images

no code implementations27 Jul 2019 Pengyu Zhao, Ansheng You, Yuanxing Zhang, Jiaying Liu, Kaigui Bian, Yunhai Tong

Specifically, we adapt the terminologies of the traditional object detection task to the omnidirectional scenarios, and propose a novel two-stage object detector, i. e., Reprojection R-CNN by combining both ERP and perspective projection.

ERP Object +3

GFF: Gated Fully Fusion for Semantic Segmentation

2 code implementations3 Apr 2019 Xiangtai Li, Houlong Zhao, Lei Han, Yunhai Tong, Kuiyuan Yang

Semantic segmentation generates comprehensive understanding of scenes through densely predicting the category for each pixel.

Scene Understanding Segmentation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.