Search Results for author: Jiaming Zhou

Found 24 papers, 5 papers with code

FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching

no code implementations16 Feb 2025 Hui Wang, Shujie Liu, Lingwei Meng, Jinyu Li, Yifan Yang, Shiwan Zhao, Haiyang Sun, Yanqing Liu, Haoqin Sun, Jiaming Zhou, Yan Lu, Yong Qin

To advance continuous-valued token modeling and temporal-coherence enforcement, we propose FELLE, an autoregressive model that integrates language modeling with token-wise flow matching.

Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment

no code implementations30 Dec 2024 Xuechen Wang, Shiwan Zhao, Haoqin Sun, Hui Wang, Jiaming Zhou, Yong Qin

Multimodal emotion recognition (MER), leveraging speech and text, has emerged as a pivotal domain within human-computer interaction, demanding sophisticated methods for effective multimodal integration.

cross-modal alignment Multimodal Emotion Recognition

Path-of-Thoughts: Extracting and Following Paths for Robust Relational Reasoning with Large Language Models

no code implementations23 Dec 2024 Ge Zhang, Mohammad Ali Alomrani, Hongjian Gu, Jiaming Zhou, Yaochen Hu, Bin Wang, Qun Liu, Mark Coates, Yingxue Zhang, Jianye Hao

Large language models (LLMs) possess vast semantic knowledge but often struggle with complex reasoning tasks, particularly in relational reasoning problems such as kinship or spatial reasoning.

Relational Reasoning Spatial Reasoning

GLOVER: Generalizable Open-Vocabulary Affordance Reasoning for Task-Oriented Grasping

no code implementations19 Nov 2024 Teli Ma, Zifan Wang, Jiaming Zhou, Mengmeng Wang, Junwei Liang

To address these limitations, we propose GLOVER, a unified Generalizable Open-Vocabulary Affordance Reasoning framework, which fine-tunes the Large Language Models (LLMs) to predict visual affordance of graspable object parts within RGB feature space.

Common Sense Reasoning Human-Object Interaction Detection +2

Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data

1 code implementation19 Sep 2024 Jiaming Zhou, Abbas Ghaddar, Ge Zhang, Liheng Ma, Yaochen Hu, Soumyasundar Pal, Mark Coates, Bin Wang, Yingxue Zhang, Jianye Hao

Despite recent advances in training and prompting strategies for Large Language Models (LLMs), these models continue to face challenges with complex logical reasoning tasks that involve long reasoning chains.

Logical Reasoning Spatial Reasoning

Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge

no code implementations9 Sep 2024 Hongfei Xue, Rong Gong, Mingchen Shao, Xin Xu, Lezhi Wang, Lei Xie, Hui Bu, Jiaming Zhou, Yong Qin, Jun Du, Ming Li, BinBin Zhang, Bin Jia

The StutteringSpeech Challenge focuses on advancing speech technologies for people who stutter, specifically targeting Stuttering Event Detection (SED) and Automatic Speech Recognition (ASR) in Mandarin.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Uncertainty-Aware Mean Opinion Score Prediction

no code implementations23 Aug 2024 Hui Wang, Shiwan Zhao, Jiaming Zhou, Xiguang Zheng, Haoqin Sun, Xuechen Wang, Yong Qin

Mean Opinion Score (MOS) prediction has made significant progress in specific domains.

Prediction

Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation

1 code implementation26 Jul 2024 Shiyao Wang, Shiwan Zhao, Jiaming Zhou, Aobo Kong, Yong Qin

To address this issue, we introduce a prototype-based approach that markedly improves DSR performance for unseen dysarthric speakers without additional fine-tuning.

Contrastive Learning speech-recognition +1

Human-Centric Transformer for Domain Adaptive Action Recognition

no code implementations15 Jul 2024 Kun-Yu Lin, Jiaming Zhou, Wei-Shi Zheng

However, existing methods are prone to losing human cues but prefer to exploit the correlation between non-human contexts and associated actions for recognition, and the contexts of interest agnostic to actions would reduce recognition performance in the target domain.

Action Recognition Domain Adaptation

Self-Prompt Tuning: Enable Autonomous Role-Playing in LLMs

no code implementations12 Jul 2024 Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, Xin Zhou, Jiaming Zhou, Haoqin Sun

Consequently, the self-prompt tuned LLMs can automatically generate expert role prompts for any given question.

Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation

no code implementations20 Jun 2024 Jiaming Zhou, Teli Ma, Kun-Yu Lin, Zifan Wang, Ronghe Qiu, Junwei Liang

Our method employs a human-robot contrastive alignment loss to align the semantics of human and robot videos, adapting pre-trained models to the robot domain in a parameter-efficient manner.

Diversity

Contrastive Imitation Learning for Language-guided Multi-Task Robotic Manipulation

no code implementations14 Jun 2024 Teli Ma, Jiaming Zhou, Zifan Wang, Ronghe Qiu, Junwei Liang

Developing robots capable of executing various manipulation tasks, guided by natural language instructions and visual observations of intricate real-world environments, remains a significant challenge in robotics.

Imitation Learning

Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores

no code implementations6 Jun 2024 Jiaming Zhou, Shiwan Zhao, Hui Wang, Tian-Hao Zhang, Haoqin Sun, Xuechen Wang, Yong Qin

To address this, we propose a novel kNN-CTC-based code-switching ASR (CS-ASR) framework that employs dual monolingual datastores and a gated datastore selection mechanism to reduce noise interference.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

CKGConv: General Graph Convolution with Continuous Kernels

1 code implementation21 Apr 2024 Liheng Ma, Soumyasundar Pal, Yitian Zhang, Jiaming Zhou, Yingxue Zhang, Mark Coates

In this work, we propose a novel and general graph convolution framework by parameterizing the kernels as continuous functions of pseudo-coordinates derived via graph positional encoding.

Graph Classification Graph Learning +2

Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition

1 code implementation3 Mar 2024 Kun-Yu Lin, Henghui Ding, Jiaming Zhou, Yu-Ming Tang, Yi-Xing Peng, Zhilin Zhao, Chen Change Loy, Wei-Shi Zheng

To answer this, we establish a CROSS-domain Open-Vocabulary Action recognition benchmark named XOV-Action, and conduct a comprehensive evaluation of five state-of-the-art CLIP-based video learners under various types of domain gaps.

Open Vocabulary Action Recognition

ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition

no code implementations22 Jan 2024 Jiaming Zhou, Junwei Liang, Kun-Yu Lin, Jinrui Yang, Wei-Shi Zheng

With the proposed ActionHub dataset, we further propose a novel Cross-modality and Cross-action Modeling (CoCo) framework for ZSAR, which consists of a Dual Cross-modality Alignment module and a Cross-action Invariance Mining module.

Action Recognition Video Description +1

GeoDeformer: Geometric Deformable Transformer for Action Recognition

no code implementations29 Nov 2023 Jinhui Ye, Jiaming Zhou, Hui Xiong, Junwei Liang

Specifically, at the core of GeoDeformer is the Geometric Deformation Predictor, a module designed to identify and quantify potential spatial and temporal geometric deformations within the given video.

Action Recognition

Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition

no code implementations28 Nov 2023 Jiaming Zhou, Hanjun Li, Kun-Yu Lin, Junwei Liang

To this end, this work aims to build a weakly supervised end-to-end framework for training recognition models on long videos, with only video-level action category labels.

Action Classification Action Recognition +6

CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition

no code implementations26 Jul 2023 Tian-Hao Zhang, Dinghao Zhou, Guiping Zhong, Jiaming Zhou, Baoxiang Li

RNN-T models are widely used in ASR, which rely on the RNN-T loss to achieve length alignment between input audio and target sequence.

Automatic Speech Recognition speech-recognition +1

MADI: Inter-domain Matching and Intra-domain Discrimination for Cross-domain Speech Recognition

no code implementations22 Feb 2023 Jiaming Zhou, Shiwan Zhao, Ning Jiang, Guoqing Zhao, Yong Qin

Unsupervised domain adaptation (UDA) aims to improve the performance on the unlabeled target domain by transferring knowledge from the source to the target domain.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Graph-Based High-Order Relation Modeling for Long-Term Action Recognition

no code implementations CVPR 2021 Jiaming Zhou, Kun-Yu Lin, Haoxin Li, Wei-Shi Zheng

In this paper, we propose a Graph-based High-order Relation Modeling (GHRM) module to exploit the high-order relations in the long-term actions for long-term action recognition.

Action Recognition Long-video Activity Recognition +3

Cannot find the paper you are looking for? You can Submit a new open access paper.