no code implementations • Findings (ACL) 2022 • Tao Jin, Zhou Zhao, Meng Zhang, Xingshan Zeng
This paper attacks the challenging problem of sign language translation (SLT), which involves not only visual and textual understanding but also additional prior knowledge learning (i. e. performing style, syntax).
no code implementations • 13 Mar 2025 • Zirun Guo, Tao Jin
In this paper, we investigate concept forgetting and concept confusion in the continual customization.
1 code implementation • 4 Mar 2025 • Zirun Guo, Tao Jin
To address this challenging problem, we propose two novel strategies: sample identification with interquartile range Smoothing and unimodal assistance, and Mutual information sharing (SuMi).
no code implementations • 1 Mar 2025 • Zirun Guo, Shulei Wang, Wang Lin, Weicai Yan, Yangyang Wu, Tao Jin
In this paper, we formulate the dynamic missing modality problem as a continual learning task and introduce the continual multimodal missing modality task.
1 code implementation • 24 Jan 2025 • Weicai Yan, Ye Wang, Wang Lin, Zirun Guo, Zhou Zhao, Tao Jin
Considering that the training parameters scale to the number of layers and tasks, we propose low-rank interaction-augmented decomposition to avoid memory explosion while enhancing the cross-modal association through sharing and separating common-specific low-rank factors.
no code implementations • 2 Jan 2025 • Xize Cheng, Dongjie Fu, Xiaoda Yang, Minghui Fang, Ruofan Hu, Jingyu Lu, Bai Jionghao, Zehan Wang, Shengpeng Ji, Rongjie Huang, Linjun Li, Yu Chen, Tao Jin, Zhou Zhao
We introduce ShareChatX, the first comprehensive, large-scale dataset for spoken dialogue that spans diverse scenarios.
no code implementations • 18 Dec 2024 • Shengpeng Ji, Ziyue Jiang, Jialong Zuo, Minghui Fang, Yifu Chen, Tao Jin, Zhou Zhao
In this paper, we propose DiscreteWM, a novel speech watermarking framework that injects watermarks into the discrete intermediate representations of speech.
1 code implementation • 12 Dec 2024 • Zirun Guo, Xize Cheng, Yangyang Wu, Tao Jin
With these designs, Wander enables token-level interactions between sequences of different modalities in a parameter-efficient way.
1 code implementation • 10 Dec 2024 • Zirun Guo, Tao Jin, Wenlong Xu, Wang Lin, Yangyang Wu
However, in real-world dynamic scenarios, the distribution of target data is always changing and different from the source data used to train the model, which leads to performance degradation.
1 code implementation • 3 Nov 2024 • Zirun Guo, Tao Jin, Jingyuan Chen, Zhou Zhao
Existing methods to balance the training process always have some limitations on the loss functions, optimizers and the number of modalities and only consider modulating the magnitude of the gradients while ignoring the directions of the gradients.
no code implementations • 28 Oct 2024 • Xize Cheng, Siqi Zheng, Zehan Wang, Minghui Fang, Ziang Zhang, Rongjie Huang, Ziyang Ma, Shengpeng Ji, Jialong Zuo, Tao Jin, Zhou Zhao
The scaling up has brought tremendous success in the fields of vision and language in recent years.
1 code implementation • 7 Jul 2024 • Zirun Guo, Tao Jin, Zhou Zhao
The development of multimodal models has significantly advanced multimodal sentiment analysis and emotion recognition.
1 code implementation • 20 Jun 2024 • Ye Wang, Jiahao Xun, Minjie Hong, Jieming Zhu, Tao Jin, Wang Lin, Haoyuan Li, Linjun Li, Yan Xia, Zhou Zhao, Zhenhua Dong
Generative retrieval has recently emerged as a promising approach to sequential recommendation, framing candidate item retrieval as an autoregressive sequence generation problem.
no code implementations • 11 May 2024 • Wang Lin, Jingyuan Chen, Jiaxin Shi, Yichen Zhu, Chen Liang, Junzhong Miao, Tao Jin, Zhou Zhao, Fei Wu, Shuicheng Yan, Hanwang Zhang
We tackle the common challenge of inter-concept visual confusion in compositional concept generation using text-guided diffusion models (TGDMs).
1 code implementation • 8 May 2024 • Zehan Wang, Ziang Zhang, Xize Cheng, Rongjie Huang, Luping Liu, Zhenhui Ye, Haifeng Huang, Yang Zhao, Tao Jin, Peng Gao, Zhou Zhao
In this work, we propose FreeBind, an idea that treats multimodal representation spaces as basic units, and freely augments pre-trained unified space by integrating knowledge from extra expert spaces via "space bonds".
1 code implementation • 18 Mar 2024 • Yongqi Wang, Ruofan Hu, Rongjie Huang, Zhiqing Hong, RuiQi Li, Wenrui Liu, Fuming You, Tao Jin, Zhou Zhao
Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly.
no code implementations • CVPR 2024 • Tianshu Huang, John Miller, Akarsh Prabhakara, Tao Jin, Tarana Laroia, Zico Kolter, Anthony Rowe
Simulation is an invaluable tool for radio-frequency system designers that enables rapid prototyping of various algorithms for imaging, target detection, classification, and tracking.
no code implementations • CVPR 2024 • Jimin Xu, Tianbao Wang, Tao Jin, Shengyu Zhang, Dongjie Fu, Zhe Wang, Jiangjing Lyu, Chengfei Lv, Chaoyue Niu, Zhou Yu, Zhou Zhao, Fei Wu
Specifically in the first stage MPOD123 utilizes the pretrained view-conditioned diffusion model to guide the outline shape optimization of the 3D content.
no code implementations • 23 Dec 2023 • Xize Cheng, Rongjie Huang, Linjun Li, Tao Jin, Zehan Wang, Aoxiong Yin, Minglei Li, Xinyu Duan, Changpeng Yang, Zhou Zhao
However, talking head translation, converting audio-visual speech (i. e., talking head video) from one language into another, still confronts several challenges compared to audio speech: (1) Existing methods invariably rely on cascading, synthesizing via both audio and text, resulting in delays and cascading errors.
1 code implementation • 13 Oct 2023 • Zehan Wang, Ziang Zhang, Luping Liu, Yang Zhao, Haifeng Huang, Tao Jin, Zhou Zhao
Inspired by recent C-MCR, this paper proposes Extending Multimodal Contrastive Representation (Ex-MCR), a training-efficient and paired-data-free method to flexibly learn unified contrastive representation space for more than three modalities by integrating the knowledge of existing MCR spaces.
1 code implementation • 2 Oct 2023 • Qiwei Di, Tao Jin, Yue Wu, Heyang Zhao, Farzad Farnoud, Quanquan Gu
Dueling bandits is a prominent framework for decision-making involving preferential feedback, a valuable feature that fits various applications involving human interaction, such as ranking, information retrieval, and recommendation systems.
1 code implementation • CVPR 2023 • Aoxiong Yin, Tianyun Zhong, Li Tang, Weike Jin, Tao Jin, Zhou Zhao
We find that it can provide two aspects of information for the model, 1) it can help the model implicitly learn the location of semantic boundaries in continuous sign language videos, 2) it can help the model understand the sign language video globally.
Ranked #7 on
Gloss-free Sign Language Translation
on PHOENIX14T
1 code implementation • 10 Jun 2023 • Xize Cheng, Tao Jin, Linjun Li, Wang Lin, Xinyu Duan, Zhou Zhao
We demonstrate that OpenSR enables modality transfer from one to any in three different settings (zero-, few- and full-shot), and achieves highly competitive zero-shot performance compared to the existing few-shot and full-shot lip-reading methods.
no code implementations • CVPR 2023 • Haoyuan Li, Hao Jiang, Tao Jin, Mengyan Li, Yan Chen, Zhijie Lin, Yang Zhao, Zhou Zhao
Then, we present two cooperative seekers to simultaneously search the image for PR and localize the product for PG.
no code implementations • 15 Mar 2023 • Yue Wu, Tao Jin, Hao Lou, Farzad Farnoud, Quanquan Gu
To attain this lower bound, we propose an explore-then-commit type algorithm for the stochastic setting, which has a nearly matching regret upper bound $\tilde{O}(d^{2/3} T^{2/3})$.
2 code implementations • ICCV 2023 • Xize Cheng, Linjun Li, Tao Jin, Rongjie Huang, Wang Lin, Zehan Wang, Huangdai Liu, Ye Wang, Aoxiong Yin, Zhou Zhao
However, despite researchers exploring cross-lingual translation techniques such as machine translation and audio speech translation to overcome language barriers, there is still a shortage of cross-lingual studies on visual speech.
no code implementations • ICCV 2023 • Wang Lin, Tao Jin, Ye Wang, Wenwen Pan, Linjun Li, Xize Cheng, Zhou Zhao
In this study, we propose a new task, group video captioning, which aims to infer the desired content among a group of target videos and describe it with another group of related reference videos.
no code implementations • NeurIPS 2021 • Tao Jin, Zhou Zhao
The majority of existing multimodal sequential learning methods focus on how to obtain effective representations and ignore the importance of multimodal fusion.
no code implementations • 8 Oct 2021 • Yue Wu, Tao Jin, Hao Lou, Pan Xu, Farzad Farnoud, Quanquan Gu
In heterogeneous rank aggregation problems, users often exhibit various accuracy levels when comparing pairs of items.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Tao Jin, Siyu Huang, Yingming Li, Zhongfei Zhang
Tensor-based fusion methods have been proven effective in multimodal fusion tasks.
no code implementations • 23 Jul 2020 • Tao Jin, Siyu Huang, Ming Chen, Yingming Li, Zhongfei Zhang
However, video captioning is a multimodal learning problem, and the video features have much redundancy between different time steps.
1 code implementation • 3 Dec 2019 • Tao Jin, Pan Xu, Quanquan Gu, Farzad Farnoud
By allowing different noise distributions, the proposed HTM model maintains the generality of Thurstone's original framework, and as such, also extends the Bradley-Terry-Luce (BTL) model for pairwise comparisons to heterogeneous populations of users.
no code implementations • IJCNLP 2019 • Tao Jin, Siyu Huang, Yingming Li, Zhongfei Zhang
This paper addresses the challenging task of video captioning which aims to generate descriptions for video data.
no code implementations • 18 Apr 2019 • Ying Wang, Xiao Xu, Tao Jin, Xiang Li, Guotong Xie, Jian-Min Wang
In addition, for unordered medical activity set, existing medical RL methods utilize a simple pooling strategy, which would result in indistinguishable contributions among the activities for learning.
no code implementations • 28 May 2016 • Chenghao Liu, Tao Jin, Steven C. H. Hoi, Peilin Zhao, Jianling Sun
In this paper, we propose a novel scheme of Online Bayesian Collaborative Topic Regression (OBCTR) which is efficient and scalable for learning from data streams.