Search Results for author: Tao Jin

Found 21 papers, 7 papers with code

Prior Knowledge and Memory Enriched Transformer for Sign Language Translation

no code implementations Findings (ACL) 2022 Tao Jin, Zhou Zhao, Meng Zhang, Xingshan Zeng

This paper attacks the challenging problem of sign language translation (SLT), which involves not only visual and textual understanding but also additional prior knowledge learning (i. e. performing style, syntax).

POS Sentence +2

Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt

no code implementations18 Mar 2024 Yongqi Wang, Ruofan Hu, Rongjie Huang, Zhiqing Hong, RuiQi Li, Wenrui Liu, Fuming You, Tao Jin, Zhou Zhao

Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly.

Attribute Singing Voice Synthesis

DART: Implicit Doppler Tomography for Radar Novel View Synthesis

no code implementations6 Mar 2024 Tianshu Huang, John Miller, Akarsh Prabhakara, Tao Jin, Tarana Laroia, Zico Kolter, Anthony Rowe

Simulation is an invaluable tool for radio-frequency system designers that enables rapid prototyping of various algorithms for imaging, target detection, classification, and tracking.

Novel View Synthesis

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation

no code implementations23 Dec 2023 Xize Cheng, Rongjie Huang, Linjun Li, Tao Jin, Zehan Wang, Aoxiong Yin, Minglei Li, Xinyu Duan, Changpeng Yang, Zhou Zhao

However, talking head translation, converting audio-visual speech (i. e., talking head video) from one language into another, still confronts several challenges compared to audio speech: (1) Existing methods invariably rely on cascading, synthesizing via both audio and text, resulting in delays and cascading errors.

Self-Supervised Learning Speech-to-Speech Translation +1

Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers

2 code implementations13 Dec 2023 Haifeng Huang, Zehan Wang, Rongjie Huang, Luping Liu, Xize Cheng, Yang Zhao, Tao Jin, Zhou Zhao

These tokens capture the object's attributes and spatial relationships with surrounding objects in the 3D scene.

Attribute Object +1

Extending Multi-modal Contrastive Representations

1 code implementation13 Oct 2023 Zehan Wang, Ziang Zhang, Luping Liu, Yang Zhao, Haifeng Huang, Tao Jin, Zhou Zhao

Inspired by recent C-MCR, this paper proposes Extending Multimodal Contrastive Representation (Ex-MCR), a training-efficient and paired-data-free method to flexibly learn unified contrastive representation space for more than three modalities by integrating the knowledge of existing MCR spaces.

3D Object Classification Representation Learning +1

Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits

no code implementations2 Oct 2023 Qiwei Di, Tao Jin, Yue Wu, Heyang Zhao, Farzad Farnoud, Quanquan Gu

Dueling bandits is a prominent framework for decision-making involving preferential feedback, a valuable feature that fits various applications involving human interaction, such as ranking, information retrieval, and recommendation systems.

Computational Efficiency Decision Making +2

Gloss Attention for Gloss-free Sign Language Translation

1 code implementation CVPR 2023 Aoxiong Yin, Tianyun Zhong, Li Tang, Weike Jin, Tao Jin, Zhou Zhao

We find that it can provide two aspects of information for the model, 1) it can help the model implicitly learn the location of semantic boundaries in continuous sign language videos, 2) it can help the model understand the sign language video globally.

Gloss-free Sign Language Translation Language Modelling +4

OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment

1 code implementation10 Jun 2023 Xize Cheng, Tao Jin, Linjun Li, Wang Lin, Xinyu Duan, Zhou Zhao

We demonstrate that OpenSR enables modality transfer from one to any in three different settings (zero-, few- and full-shot), and achieves highly competitive zero-shot performance compared to the existing few-shot and full-shot lip-reading methods.

Audio-Visual Speech Recognition Lip Reading +2

Borda Regret Minimization for Generalized Linear Dueling Bandits

no code implementations15 Mar 2023 Yue Wu, Tao Jin, Hao Lou, Farzad Farnoud, Quanquan Gu

To attain this lower bound, we propose an explore-then-commit type algorithm for the stochastic setting, which has a nearly matching regret upper bound $\tilde{O}(d^{2/3} T^{2/3})$.

Recommendation Systems

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition

2 code implementations ICCV 2023 Xize Cheng, Linjun Li, Tao Jin, Rongjie Huang, Wang Lin, Zehan Wang, Huangdai Liu, Ye Wang, Aoxiong Yin, Zhou Zhao

However, despite researchers exploring cross-lingual translation techniques such as machine translation and audio speech translation to overcome language barriers, there is still a shortage of cross-lingual studies on visual speech.

Lip Reading Machine Translation +4

Exploring Group Video Captioning with Efficient Relational Approximation

no code implementations ICCV 2023 Wang Lin, Tao Jin, Ye Wang, Wenwen Pan, Linjun Li, Xize Cheng, Zhou Zhao

In this study, we propose a new task, group video captioning, which aims to infer the desired content among a group of target videos and describe it with another group of related reference videos.

Video Captioning

Generalizable Multi-linear Attention Network

no code implementations NeurIPS 2021 Tao Jin, Zhou Zhao

The majority of existing multimodal sequential learning methods focus on how to obtain effective representations and ignore the importance of multimodal fusion.

Multimodal Sentiment Analysis Retrieval +1

Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise Comparisons

1 code implementation8 Oct 2021 Yue Wu, Tao Jin, Hao Lou, Pan Xu, Farzad Farnoud, Quanquan Gu

In heterogeneous rank aggregation problems, users often exhibit various accuracy levels when comparing pairs of items.

SBAT: Video Captioning with Sparse Boundary-Aware Transformer

no code implementations23 Jul 2020 Tao Jin, Siyu Huang, Ming Chen, Yingming Li, Zhongfei Zhang

However, video captioning is a multimodal learning problem, and the video features have much redundancy between different time steps.

Machine Translation Text Generation +2

Rank Aggregation via Heterogeneous Thurstone Preference Models

1 code implementation3 Dec 2019 Tao Jin, Pan Xu, Quanquan Gu, Farzad Farnoud

By allowing different noise distributions, the proposed HTM model maintains the generality of Thurstone's original framework, and as such, also extends the Bradley-Terry-Luce (BTL) model for pairwise comparisons to heterogeneous populations of users.

Inpatient2Vec: Medical Representation Learning for Inpatients

no code implementations18 Apr 2019 Ying Wang, Xiao Xu, Tao Jin, Xiang Li, Guotong Xie, Jian-Min Wang

In addition, for unordered medical activity set, existing medical RL methods utilize a simple pooling strategy, which would result in indistinguishable contributions among the activities for learning.

Representation Learning Semantic Similarity +1

Online Bayesian Collaborative Topic Regression

no code implementations28 May 2016 Chenghao Liu, Tao Jin, Steven C. H. Hoi, Peilin Zhao, Jianling Sun

In this paper, we propose a novel scheme of Online Bayesian Collaborative Topic Regression (OBCTR) which is efficient and scalable for learning from data streams.

Recommendation Systems regression

Cannot find the paper you are looking for? You can Submit a new open access paper.