Search Results for author: Min Tang

Found 28 papers, 9 papers with code

DeepRTE: Pre-trained Attention-based Neural Network for Radiative Tranfer

1 code implementation29 May 2025 Yekun Zhu, Min Tang, Zheng Ma

In this study, we propose a novel neural network approach, termed DeepRTE, to address the steady-state Radiative Transfer Equation (RTE).

Computational Efficiency RTE

Flow Matching based Sequential Recommender Model

1 code implementation22 May 2025 Feng Liu, Lixin Zou, Xiangyu Zhao, Min Tang, Liming Dong, Dan Luo, Xiangyang Luo, Chenliang Li

Generative models, particularly diffusion model, have emerged as powerful tools for sequential recommendation.

model Sequential Recommendation

RLCAD: Reinforcement Learning Training Gym for Revolution Involved CAD Command Sequence Generation

no code implementations24 Mar 2025 Xiaolong Yin, Xingyu Lu, Jiahang Shen, Jingzhe Ni, Hailong Li, Ruofeng Tong, Min Tang, Peng Du

A CAD command sequence is a typical parametric design paradigm in 3D CAD systems where a model is constructed by overlaying 2D sketches with operations such as extrusion, revolution, and Boolean operations.

Reinforcement Learning (RL)

Efficient Sparse Attention needs Adaptive Token Release

no code implementations2 Jul 2024 Chaoran Zhang, Lixin Zou, Dan Luo, Min Tang, Xiangyang Luo, Zihao Li, Chenliang Li

In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide array of text-centric tasks.

Text Generation

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

1 code implementation26 Jun 2024 Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Xu Tan, Yanqing Liu, Sheng Zhao, Naoyuki Kanda

This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility.

text-to-speech Text to Speech

An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS

no code implementations9 Jun 2024 Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Yufei Xia, Jinzhu Li, Sheng Zhao, Jinyu Li, Naoyuki Kanda

Recently, zero-shot text-to-speech (TTS) systems, capable of synthesizing any speaker's voice from a short audio prompt, have made rapid advancements.

Denoising Speech Denoising +4

Total-Duration-Aware Duration Modeling for Text-to-Speech Systems

no code implementations6 Jun 2024 Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Jinyu Li, Sheng Zhao, Naoyuki Kanda

We also show that the proposed MaskGIT-based model can generate phoneme durations with higher quality and diversity compared to its regression or flow-matching counterparts.

Diversity text-to-speech +1

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

no code implementations12 Feb 2024 Naoyuki Kanda, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Yufei Xia, Jinzhu Li, Yanqing Liu, Sheng Zhao, Michael Zeng

In this work, we propose ELaTE, a zero-shot TTS that can generate natural laughing speech of any speaker based on a short audio prompt with precise control of laughter timing and expression.

text-to-speech Text to Speech

NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

no code implementations16 Jan 2024 Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe`er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka

The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios, with single-channel and known-geometry multi-channel tracks, and serves as a launch platform for two new datasets: First, a benchmarking dataset of 315 meetings, averaging 6 minutes each, capturing a broad spectrum of real-world acoustic conditions and conversational dynamics.

Automatic Speech Recognition Benchmarking +4

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

no code implementations14 Aug 2023 Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka

Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech.

Language Modeling Language Modelling +5

CTSN: Predicting Cloth Deformation for Skeleton-based Characters with a Two-stream Skinning Network

no code implementations30 May 2023 Yudi Li, Min Tang, Yun Yang, Ruofeng Tong, Shuangcai Yang, Yao Li, Bailin An, Qilong Kou

We present a novel learning method to predict the cloth deformation for skeleton-based characters with a two-stream network.

Real-Time Audio-Visual End-to-End Speech Enhancement

no code implementations13 Mar 2023 Zirun Zhu, Hemin Yang, Min Tang, ZiYi Yang, Sefik Emre Eskimez, Huaming Wang

In this paper, we propose a low-latency real-time audio-visual end-to-end enhancement (AV-E3Net) model based on the recently proposed end-to-end enhancement network (E3Net).

Speech Enhancement Task 2

StyleIPSB: Identity-Preserving Semantic Basis of StyleGAN for High Fidelity Face Swapping

1 code implementation CVPR 2023 Diqiong Jiang, Dan Song, Ruofeng Tong, Min Tang

StyleIPSB gives us a novel tool for high-fidelity face swapping, and we propose a three-stage framework for face swapping with StyleIPSB.

Attribute Face Swapping

Exploring WavLM on Speech Enhancement

no code implementations18 Nov 2022 Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min Tang, Jong Won Shin, Shujie Liu

There is a surge in interest in self-supervised learning approaches for end-to-end speech encoding in recent years as they have achieved great success.

Self-Supervised Learning Speech Enhancement +2

Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation

no code implementations4 Nov 2022 Sefik Emre Eskimez, Takuya Yoshioka, Alex Ju, Min Tang, Tanel Parnamaa, Huaming Wang

Personalized speech enhancement (PSE) is a real-time SE approach utilizing a speaker embedding of a target person to remove background noise, reverberation, and interfering voices.

Acoustic echo cancellation Multi-Task Learning +1

N-Cloth: Predicting 3D Cloth Deformation with Mesh-Based Networks

no code implementations13 Dec 2021 Yudi Li, Min Tang, Yun Yang, Zi Huang, Ruofeng Tong, Shuangcai Yang, Yao Li, Dinesh Manocha

We present a novel mesh-based learning approach (N-Cloth) for plausible 3D cloth deformation prediction.

Sphere Face Model:A 3D Morphable Model with Hypersphere Manifold Latent Space

no code implementations4 Dec 2021 Diqiong Jiang, Yiwei Jin, FangLue Zhang, Zhe Zhu, Yun Zhang, Ruofeng Tong, Min Tang

However, the shape parameters of traditional 3DMMs satisfy the multivariate Gaussian distribution while the identity embeddings satisfy the hypersphere distribution, and this conflict makes it challenging for face reconstruction models to preserve the faithfulness and the shape consistency simultaneously.

Face Model Face Reconstruction

VarArray: Array-Geometry-Agnostic Continuous Speech Separation

no code implementations12 Oct 2021 Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda

Continuous speech separation using a microphone array was shown to be promising in dealing with the speech overlap problem in natural conversation transcription.

Speech Separation

Reconstructing Recognizable 3D Face Shapes based on 3D Morphable Models

no code implementations8 Apr 2021 Diqiong Jiang, Yiwei Jin, FangLue Zhang, Yukun Yai, Risheng Deng, Ruofeng Tong, Min Tang

We compare our method with existing methods in terms of the reconstruction error, visual distinguishability, and face recognition accuracy of the shape parameters.

Face Recognition

Hierarchical Optimization Time Integration for CFL-rate MPM Stepping

1 code implementation18 Nov 2019 Xinlei Wang, Minchen Li, Yu Fang, Xinxin Zhang, Ming Gao, Min Tang, Danny M. Kaufman, Chenfanfu Jiang

We propose Hierarchical Optimization Time Integration (HOT) for efficient implicit time-stepping of the Material Point Method (MPM) irrespective of simulated materials and conditions.

Graphics

Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation

1 code implementation14 Aug 2018 Chengyang Li, Dan Song, Ruofeng Tong, Min Tang

To narrow this gap, we propose a network fusion architecture, which consists of a multispectral proposal network to generate pedestrian proposals, and a subsequent multispectral classification network to distinguish pedestrian instances from hard negatives.

Autonomous Driving Multispectral Object Detection +2

Illumination-aware Faster R-CNN for Robust Multispectral Pedestrian Detection

no code implementations14 Mar 2018 Chengyang Li, Dan Song, Ruofeng Tong, Min Tang

Multispectral images of color-thermal pairs have shown more effective than a single color channel for pedestrian detection, especially under challenging illumination conditions.

Multispectral Object Detection Pedestrian Detection

End-to-end detection-segmentation network with ROI convolution

1 code implementation8 Jan 2018 Zichen Zhang, Min Tang, Dana Cobzas, Dornoosh Zonoobi, Martin Jagersand, Jacob L. Jaremko

We propose an end-to-end neural network that improves the segmentation accuracy of fully convolutional networks by incorporating a localization unit.

Object Localization Segmentation

A deep level set method for image segmentation

no code implementations17 May 2017 Min Tang, Sepehr Valipour, Zichen Vincent Zhang, Dana Cobzas, MartinJagersand

This paper proposes a novel image segmentation approachthat integrates fully convolutional networks (FCNs) with a level setmodel.

Image Segmentation Semantic Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.