Search Results for author: Huan Zhou

Found 10 papers, 2 papers with code

Unimodal and Crossmodal Refinement Network for Multimodal Sequence Fusion

no code implementations EMNLP 2021 Xiaobao Guo, Adams Kong, Huan Zhou, Xianfeng Wang, Min Wang

Specifically, to improve unimodal representations, a unimodal refinement module is designed to refine modality-specific learning via iteratively updating the distribution with transformer-based attention layers.

Representation Learning

Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder

no code implementations8 Apr 2024 He Wang, Pengcheng Guo, Xucheng Wan, Huan Zhou, Lei Xie

Automatic lip-reading (ALR) aims to automatically transcribe spoken content from a speaker's silent lip motion captured in video.

Lipreading Lip Reading +1

Exploiting Low-level Representations for Ultra-Fast Road Segmentation

1 code implementation4 Feb 2024 Huan Zhou, Feng Xue, Yucong Li, Shi Gong, Yiqun Li, Yu Zhou

The spatial detail branch is firstly designed to extract low-level feature representation for the road by the first stage of ResNet-18.

Road Segmentation

X-SepFormer: End-to-end Speaker Extraction Network with Explicit Optimization on Speaker Confusion

no code implementations9 Mar 2023 Kai Liu, Ziqing Du, Xucheng Wan, Huan Zhou

To mitigate the imperative SC issue, we reformulate the training objective and propose two novel loss schemes that explore the metric of reconstruction improvement performance defined at small chunk-level and leverage the metric associated distribution information.

Speech Extraction

Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings

no code implementations16 Jan 2023 Kai Liu, Xucheng Wan, Ziqing Du, Huan Zhou

As a practical alternative of speech separation, target speaker extraction (TSE) aims to extract the speech from the desired speaker using additional speaker cue extracted from the speaker.

Speaker Verification Speech Separation +1

CGI-Stereo: Accurate and Real-Time Stereo Matching via Context and Geometry Interaction

1 code implementation7 Jan 2023 Gangwei Xu, Huan Zhou, Xin Yang

In this paper, we propose CGI-Stereo, a novel neural network architecture that can concurrently achieve real-time performance, competitive accuracy, and strong generalization ability.

Stereo Matching

Joint Speech Activity and Overlap Detection with Multi-Exit Architecture

no code implementations24 Sep 2022 Ziqing Du, Kai Liu, Xucheng Wan, Huan Zhou

Overlapped speech detection (OSD) is critical for speech applications in scenario of multi-party conversion.

Action Detection Activity Detection +1

Speech Enhancement with Perceptually-motivated Optimization and Dual Transformations

no code implementations24 Sep 2022 Xucheng Wan, Kai Liu, Ziqing Du, Huan Zhou

To validate the effectiveness of our proposed model, extensive experiments are conducted on the DNS2020 dataset.

Speech Enhancement

Container Orchestration on HPC Systems

no code implementations16 Dec 2020 Naweiluo Zhou, Yiannis Georgiou, Li Zhong, Huan Zhou, Marcin Pospieszny

Containerisation demonstrates its efficiency in application deployment in cloud computing.

Distributed, Parallel, and Cluster Computing

Cannot find the paper you are looking for? You can Submit a new open access paper.