Search Results for author: Tao Jin

Found 35 papers, 15 papers with code

Prior Knowledge and Memory Enriched Transformer for Sign Language Translation

no code implementations Findings (ACL) 2022 Tao Jin, Zhou Zhao, Meng Zhang, Xingshan Zeng

This paper attacks the challenging problem of sign language translation (SLT), which involves not only visual and textual understanding but also additional prior knowledge learning (i. e. performing style, syntax).

Decoder POS +3

ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation

no code implementations13 Mar 2025 Zirun Guo, Tao Jin

In this paper, we investigate concept forgetting and concept confusion in the continual customization.

Text-to-Image Generation

Smoothing the Shift: Towards Stable Test-Time Adaptation under Complex Multimodal Noises

1 code implementation4 Mar 2025 Zirun Guo, Tao Jin

To address this challenging problem, we propose two novel strategies: sample identification with interquartile range Smoothing and unimodal assistance, and Mutual information sharing (SuMi).

Test-time Adaptation

Efficient Prompting for Continual Adaptation to Missing Modalities

no code implementations1 Mar 2025 Zirun Guo, Shulei Wang, Wang Lin, Weicai Yan, Yangyang Wu, Tao Jin

In this paper, we formulate the dynamic missing modality problem as a continual learning task and introduce the continual multimodal missing modality task.

Continual Learning

Low-rank Prompt Interaction for Continual Vision-Language Retrieval

1 code implementation24 Jan 2025 Weicai Yan, Ye Wang, Wang Lin, Zirun Guo, Zhou Zhao, Tao Jin

Considering that the training parameters scale to the number of layers and tasks, we propose low-rank interaction-augmented decomposition to avoid memory explosion while enhancing the cross-modal association through sharing and separating common-specific low-rank factors.

Continual Learning Contrastive Learning +1

Speech Watermarking with Discrete Intermediate Representations

no code implementations18 Dec 2024 Shengpeng Ji, Ziyue Jiang, Jialong Zuo, Minghui Fang, Yifu Chen, Tao Jin, Zhou Zhao

In this paper, we propose DiscreteWM, a novel speech watermarking framework that injects watermarks into the discrete intermediate representations of speech.

Voice Cloning

A Wander Through the Multimodal Landscape: Efficient Transfer Learning via Low-rank Sequence Multimodal Adapter

1 code implementation12 Dec 2024 Zirun Guo, Xize Cheng, Yangyang Wu, Tao Jin

With these designs, Wander enables token-level interactions between sequences of different modalities in a parameter-efficient way.

Transfer Learning

Bridging the Gap for Test-Time Multimodal Sentiment Analysis

1 code implementation10 Dec 2024 Zirun Guo, Tao Jin, Wenlong Xu, Wang Lin, Yangyang Wu

However, in real-world dynamic scenarios, the distribution of target data is always changing and different from the source data used to train the model, which leads to performance degradation.

Multimodal Sentiment Analysis Pseudo Label +1

Classifier-guided Gradient Modulation for Enhanced Multimodal Learning

1 code implementation3 Nov 2024 Zirun Guo, Tao Jin, Jingyuan Chen, Zhou Zhao

Existing methods to balance the training process always have some limitations on the loss functions, optimizers and the number of modalities and only consider modulating the magnitude of the gradients while ignoring the directions of the gradients.

Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition

1 code implementation7 Jul 2024 Zirun Guo, Tao Jin, Zhou Zhao

The development of multimodal models has significantly advanced multimodal sentiment analysis and emotion recognition.

Emotion Recognition Multimodal Sentiment Analysis

EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration

1 code implementation20 Jun 2024 Ye Wang, Jiahao Xun, Minjie Hong, Jieming Zhu, Tao Jin, Wang Lin, Haoyuan Li, Linjun Li, Yan Xia, Zhou Zhao, Zhenhua Dong

Generative retrieval has recently emerged as a promising approach to sequential recommendation, framing candidate item retrieval as an autoregressive sequence generation problem.

Retrieval Sequential Recommendation

Non-confusing Generation of Customized Concepts in Diffusion Models

no code implementations11 May 2024 Wang Lin, Jingyuan Chen, Jiaxin Shi, Yichen Zhu, Chen Liang, Junzhong Miao, Tao Jin, Zhou Zhao, Fei Wu, Shuicheng Yan, Hanwang Zhang

We tackle the common challenge of inter-concept visual confusion in compositional concept generation using text-guided diffusion models (TGDMs).

FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion

1 code implementation8 May 2024 Zehan Wang, Ziang Zhang, Xize Cheng, Rongjie Huang, Luping Liu, Zhenhui Ye, Haifeng Huang, Yang Zhao, Tao Jin, Peng Gao, Zhou Zhao

In this work, we propose FreeBind, an idea that treats multimodal representation spaces as basic units, and freely augments pre-trained unified space by integrating knowledge from extra expert spaces via "space bonds".

Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt

1 code implementation18 Mar 2024 Yongqi Wang, Ruofan Hu, Rongjie Huang, Zhiqing Hong, RuiQi Li, Wenrui Liu, Fuming You, Tao Jin, Zhou Zhao

Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly.

Attribute Decoder +1

DART: Implicit Doppler Tomography for Radar Novel View Synthesis

no code implementations CVPR 2024 Tianshu Huang, John Miller, Akarsh Prabhakara, Tao Jin, Tarana Laroia, Zico Kolter, Anthony Rowe

Simulation is an invaluable tool for radio-frequency system designers that enables rapid prototyping of various algorithms for imaging, target detection, classification, and tracking.

Novel View Synthesis

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation

no code implementations23 Dec 2023 Xize Cheng, Rongjie Huang, Linjun Li, Tao Jin, Zehan Wang, Aoxiong Yin, Minglei Li, Xinyu Duan, Changpeng Yang, Zhou Zhao

However, talking head translation, converting audio-visual speech (i. e., talking head video) from one language into another, still confronts several challenges compared to audio speech: (1) Existing methods invariably rely on cascading, synthesizing via both audio and text, resulting in delays and cascading errors.

es-en fr-en +3

Extending Multi-modal Contrastive Representations

1 code implementation13 Oct 2023 Zehan Wang, Ziang Zhang, Luping Liu, Yang Zhao, Haifeng Huang, Tao Jin, Zhou Zhao

Inspired by recent C-MCR, this paper proposes Extending Multimodal Contrastive Representation (Ex-MCR), a training-efficient and paired-data-free method to flexibly learn unified contrastive representation space for more than three modalities by integrating the knowledge of existing MCR spaces.

3D Object Classification Representation Learning +1

Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits

1 code implementation2 Oct 2023 Qiwei Di, Tao Jin, Yue Wu, Heyang Zhao, Farzad Farnoud, Quanquan Gu

Dueling bandits is a prominent framework for decision-making involving preferential feedback, a valuable feature that fits various applications involving human interaction, such as ranking, information retrieval, and recommendation systems.

Computational Efficiency Decision Making +2

Gloss Attention for Gloss-free Sign Language Translation

1 code implementation CVPR 2023 Aoxiong Yin, Tianyun Zhong, Li Tang, Weike Jin, Tao Jin, Zhou Zhao

We find that it can provide two aspects of information for the model, 1) it can help the model implicitly learn the location of semantic boundaries in continuous sign language videos, 2) it can help the model understand the sign language video globally.

Gloss-free Sign Language Translation Language Modeling +5

OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment

1 code implementation10 Jun 2023 Xize Cheng, Tao Jin, Linjun Li, Wang Lin, Xinyu Duan, Zhou Zhao

We demonstrate that OpenSR enables modality transfer from one to any in three different settings (zero-, few- and full-shot), and achieves highly competitive zero-shot performance compared to the existing few-shot and full-shot lip-reading methods.

Audio-Visual Speech Recognition Lip Reading +2

Borda Regret Minimization for Generalized Linear Dueling Bandits

no code implementations15 Mar 2023 Yue Wu, Tao Jin, Hao Lou, Farzad Farnoud, Quanquan Gu

To attain this lower bound, we propose an explore-then-commit type algorithm for the stochastic setting, which has a nearly matching regret upper bound $\tilde{O}(d^{2/3} T^{2/3})$.

Recommendation Systems

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition

2 code implementations ICCV 2023 Xize Cheng, Linjun Li, Tao Jin, Rongjie Huang, Wang Lin, Zehan Wang, Huangdai Liu, Ye Wang, Aoxiong Yin, Zhou Zhao

However, despite researchers exploring cross-lingual translation techniques such as machine translation and audio speech translation to overcome language barriers, there is still a shortage of cross-lingual studies on visual speech.

Lip Reading Machine Translation +4

Exploring Group Video Captioning with Efficient Relational Approximation

no code implementations ICCV 2023 Wang Lin, Tao Jin, Ye Wang, Wenwen Pan, Linjun Li, Xize Cheng, Zhou Zhao

In this study, we propose a new task, group video captioning, which aims to infer the desired content among a group of target videos and describe it with another group of related reference videos.

Video Captioning

Generalizable Multi-linear Attention Network

no code implementations NeurIPS 2021 Tao Jin, Zhou Zhao

The majority of existing multimodal sequential learning methods focus on how to obtain effective representations and ignore the importance of multimodal fusion.

Multimodal Sentiment Analysis Retrieval +1

Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise Comparisons

no code implementations8 Oct 2021 Yue Wu, Tao Jin, Hao Lou, Pan Xu, Farzad Farnoud, Quanquan Gu

In heterogeneous rank aggregation problems, users often exhibit various accuracy levels when comparing pairs of items.

SBAT: Video Captioning with Sparse Boundary-Aware Transformer

no code implementations23 Jul 2020 Tao Jin, Siyu Huang, Ming Chen, Yingming Li, Zhongfei Zhang

However, video captioning is a multimodal learning problem, and the video features have much redundancy between different time steps.

Machine Translation multimodal interaction +3

Rank Aggregation via Heterogeneous Thurstone Preference Models

1 code implementation3 Dec 2019 Tao Jin, Pan Xu, Quanquan Gu, Farzad Farnoud

By allowing different noise distributions, the proposed HTM model maintains the generality of Thurstone's original framework, and as such, also extends the Bradley-Terry-Luce (BTL) model for pairwise comparisons to heterogeneous populations of users.

Inpatient2Vec: Medical Representation Learning for Inpatients

no code implementations18 Apr 2019 Ying Wang, Xiao Xu, Tao Jin, Xiang Li, Guotong Xie, Jian-Min Wang

In addition, for unordered medical activity set, existing medical RL methods utilize a simple pooling strategy, which would result in indistinguishable contributions among the activities for learning.

Representation Learning Semantic Similarity +1

Online Bayesian Collaborative Topic Regression

no code implementations28 May 2016 Chenghao Liu, Tao Jin, Steven C. H. Hoi, Peilin Zhao, Jianling Sun

In this paper, we propose a novel scheme of Online Bayesian Collaborative Topic Regression (OBCTR) which is efficient and scalable for learning from data streams.

Recommendation Systems regression

Cannot find the paper you are looking for? You can Submit a new open access paper.