Search Results for author: Tao Jin

Found 21 papers, 7 papers with code

Prior Knowledge and Memory Enriched Transformer for Sign Language Translation

no code implementations • Findings (ACL) 2022 • Tao Jin, Zhou Zhao, Meng Zhang, Xingshan Zeng

This paper attacks the challenging problem of sign language translation (SLT), which involves not only visual and textual understanding but also additional prior knowledge learning (i. e. performing style, syntax).

POS Sentence +2

Paper
Add Code

Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt

no code implementations • 18 Mar 2024 • Yongqi Wang, Ruofan Hu, Rongjie Huang, Zhiqing Hong, RuiQi Li, Wenrui Liu, Fuming You, Tao Jin, Zhou Zhao

Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly.

Attribute Singing Voice Synthesis

Paper
Add Code

DART: Implicit Doppler Tomography for Radar Novel View Synthesis

no code implementations • 6 Mar 2024 • Tianshu Huang, John Miller, Akarsh Prabhakara, Tao Jin, Tarana Laroia, Zico Kolter, Anthony Rowe

Simulation is an invaluable tool for radio-frequency system designers that enables rapid prototyping of various algorithms for imaging, target detection, classification, and tracking.

Novel View Synthesis

Paper
Add Code

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation

no code implementations • 23 Dec 2023 • Xize Cheng, Rongjie Huang, Linjun Li, Tao Jin, Zehan Wang, Aoxiong Yin, Minglei Li, Xinyu Duan, Changpeng Yang, Zhou Zhao

However, talking head translation, converting audio-visual speech (i. e., talking head video) from one language into another, still confronts several challenges compared to audio speech: (1) Existing methods invariably rely on cascading, synthesizing via both audio and text, resulting in delays and cascading errors.

Self-Supervised Learning Speech-to-Speech Translation +1

Paper
Add Code

Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers

2 code implementations • 13 Dec 2023 • Haifeng Huang, Zehan Wang, Rongjie Huang, Luping Liu, Xize Cheng, Yang Zhao, Tao Jin, Zhou Zhao

These tokens capture the object's attributes and spatial relationships with surrounding objects in the 3D scene.

Attribute Object +1

Paper
Code

Extending Multi-modal Contrastive Representations

1 code implementation • 13 Oct 2023 • Zehan Wang, Ziang Zhang, Luping Liu, Yang Zhao, Haifeng Huang, Tao Jin, Zhou Zhao

Inspired by recent C-MCR, this paper proposes Extending Multimodal Contrastive Representation (Ex-MCR), a training-efficient and paired-data-free method to flexibly learn unified contrastive representation space for more than three modalities by integrating the knowledge of existing MCR spaces.

3D Object Classification Representation Learning +1

Paper
Code

Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits

no code implementations • 2 Oct 2023 • Qiwei Di, Tao Jin, Yue Wu, Heyang Zhao, Farzad Farnoud, Quanquan Gu

Dueling bandits is a prominent framework for decision-making involving preferential feedback, a valuable feature that fits various applications involving human interaction, such as ranking, information retrieval, and recommendation systems.

Computational Efficiency Decision Making +2

Paper
Add Code

Gloss Attention for Gloss-free Sign Language Translation

1 code implementation • CVPR 2023 • Aoxiong Yin, Tianyun Zhong, Li Tang, Weike Jin, Tao Jin, Zhou Zhao

We find that it can provide two aspects of information for the model, 1) it can help the model implicitly learn the location of semantic boundaries in continuous sign language videos, 2) it can help the model understand the sign language video globally.

Ranked #3 on Gloss-free Sign Language Translation on PHOENIX14T

Gloss-free Sign Language Translation Language Modelling +4

Paper
Code

OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment

1 code implementation • 10 Jun 2023 • Xize Cheng, Tao Jin, Linjun Li, Wang Lin, Xinyu Duan, Zhou Zhao

We demonstrate that OpenSR enables modality transfer from one to any in three different settings (zero-, few- and full-shot), and achieves highly competitive zero-shot performance compared to the existing few-shot and full-shot lip-reading methods.

Audio-Visual Speech Recognition Lip Reading +2

Paper
Code

DATE: Domain Adaptive Product Seeker for E-commerce

no code implementations • CVPR 2023 • Haoyuan Li, Hao Jiang, Tao Jin, Mengyan Li, Yan Chen, Zhijie Lin, Yang Zhao, Zhou Zhao

Then, we present two cooperative seekers to simultaneously search the image for PR and localize the product for PG.

Domain Adaptation Retrieval +1

Paper
Add Code

Borda Regret Minimization for Generalized Linear Dueling Bandits

no code implementations • 15 Mar 2023 • Yue Wu, Tao Jin, Hao Lou, Farzad Farnoud, Quanquan Gu

To attain this lower bound, we propose an explore-then-commit type algorithm for the stochastic setting, which has a nearly matching regret upper bound $\tilde{O}(d^{2/3} T^{2/3})$.

Recommendation Systems

Paper
Add Code

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition

2 code implementations • ICCV 2023 • Xize Cheng, Linjun Li, Tao Jin, Rongjie Huang, Wang Lin, Zehan Wang, Huangdai Liu, Ye Wang, Aoxiong Yin, Zhou Zhao

However, despite researchers exploring cross-lingual translation techniques such as machine translation and audio speech translation to overcome language barriers, there is still a shortage of cross-lingual studies on visual speech.

Lip Reading Machine Translation +4

157

Paper
Code

Exploring Group Video Captioning with Efficient Relational Approximation

no code implementations • ICCV 2023 • Wang Lin, Tao Jin, Ye Wang, Wenwen Pan, Linjun Li, Xize Cheng, Zhou Zhao

In this study, we propose a new task, group video captioning, which aims to infer the desired content among a group of target videos and describe it with another group of related reference videos.

Video Captioning

Paper
Add Code

Generalizable Multi-linear Attention Network

no code implementations • NeurIPS 2021 • Tao Jin, Zhou Zhao

The majority of existing multimodal sequential learning methods focus on how to obtain effective representations and ignore the importance of multimodal fusion.

Multimodal Sentiment Analysis Retrieval +1

Paper
Add Code

Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise Comparisons

1 code implementation • 8 Oct 2021 • Yue Wu, Tao Jin, Hao Lou, Pan Xu, Farzad Farnoud, Quanquan Gu

In heterogeneous rank aggregation problems, users often exhibit various accuracy levels when comparing pairs of items.

Paper
Code

Dual Low-Rank Multimodal Fusion

no code implementations • Findings of the Association for Computational Linguistics 2020 • Tao Jin, Siyu Huang, Yingming Li, Zhongfei Zhang

Tensor-based fusion methods have been proven effective in multimodal fusion tasks.

Paper
Add Code

SBAT: Video Captioning with Sparse Boundary-Aware Transformer

no code implementations • 23 Jul 2020 • Tao Jin, Siyu Huang, Ming Chen, Yingming Li, Zhongfei Zhang

However, video captioning is a multimodal learning problem, and the video features have much redundancy between different time steps.

Machine Translation Text Generation +2

Paper
Add Code

Rank Aggregation via Heterogeneous Thurstone Preference Models

1 code implementation • 3 Dec 2019 • Tao Jin, Pan Xu, Quanquan Gu, Farzad Farnoud

By allowing different noise distributions, the proposed HTM model maintains the generality of Thurstone's original framework, and as such, also extends the Bradley-Terry-Luce (BTL) model for pairwise comparisons to heterogeneous populations of users.

Paper
Code

Low-Rank HOCA: Efficient High-Order Cross-Modal Attention for Video Captioning

no code implementations • IJCNLP 2019 • Tao Jin, Siyu Huang, Yingming Li, Zhongfei Zhang

This paper addresses the challenging task of video captioning which aims to generate descriptions for video data.

Tensor Decomposition Video Captioning

Paper
Add Code

Inpatient2Vec: Medical Representation Learning for Inpatients

no code implementations • 18 Apr 2019 • Ying Wang, Xiao Xu, Tao Jin, Xiang Li, Guotong Xie, Jian-Min Wang

In addition, for unordered medical activity set, existing medical RL methods utilize a simple pooling strategy, which would result in indistinguishable contributions among the activities for learning.

Representation Learning Semantic Similarity +1

Paper
Add Code

Online Bayesian Collaborative Topic Regression

no code implementations • 28 May 2016 • Chenghao Liu, Tao Jin, Steven C. H. Hoi, Peilin Zhao, Jianling Sun

In this paper, we propose a novel scheme of Online Bayesian Collaborative Topic Regression (OBCTR) which is efficient and scalable for learning from data streams.

Recommendation Systems regression

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.