Search Results for author: Hao Tang

Found 82 papers, 40 papers with code

TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation

no code implementations18 Oct 2021 Haoyu Ma, Liangjian Chen, Deying Kong, Zhe Wang, Xingwei Liu, Hao Tang, Xiangyi Yan, Yusheng Xie, Shih-Yao Lin, Xiaohui Xie

The 3D position encoding guided by the epipolar field provides an efficient way of encoding correspondences between pixels of different views.

3D Human Pose Estimation 3D Pose Estimation

Equivariant Neural Network for Factor Graphs

no code implementations29 Sep 2021 Fan-Yun Sun, Jonathan Kuck, Hao Tang, Stefano Ermon

Several indices used in a factor graph data structure can be permuted without changing the underlying probability distribution.

On the Difficulty of Segmenting Words with Attention

no code implementations21 Sep 2021 Ramon Sanabria, Hao Tang, Sharon Goldwater

Word segmentation, the problem of finding word boundaries in speech, is of interest for a range of tasks.

Speech Recognition Translation

Multi-Sample based Contrastive Loss for Top-k Recommendation

no code implementations1 Sep 2021 Hao Tang, Guoshuai Zhao, Yuxia Wu, Xueming Qian

Therefore, we propose a Multi-Sample based Contrastive Loss (MSCL) function which solves the two problems by balancing the importance of positive and negative samples and data augmentation.

Contrastive Learning Data Augmentation +1

Layout-to-Image Translation with Double Pooling Generative Adversarial Networks

1 code implementation29 Aug 2021 Hao Tang, Nicu Sebe

In this paper, we address the task of layout-to-image translation, which aims to translate an input semantic layout to a realistic image.

Translation

Intrinsic-Extrinsic Preserved GANs for Unsupervised 3D Pose Transfer

1 code implementation ICCV 2021 Haoyu Chen, Hao Tang, Henglin Shi, Wei Peng, Nicu Sebe, Guoying Zhao

With the strength of deep generative models, 3D pose transfer regains intensive research interests in recent years.

Pose Transfer

Recurrent Mask Refinement for Few-Shot Medical Image Segmentation

1 code implementation ICCV 2021 Hao Tang, Xingwei Liu, Shanlin Sun, Xiangyi Yan, Xiaohui Xie

Although having achieved great success in medical image segmentation, deep convolutional neural networks usually require a large dataset with manual annotations for training and are difficult to generalize to unseen classes.

Few-Shot Learning Medical Image Segmentation

Cross-View Exocentric to Egocentric Video Synthesis

no code implementations7 Jul 2021 Gaowen Liu, Hao Tang, Hugo Latapie, Jason Corso, Yan Yan

Particularly, we propose a novel Bi-directional Spatial Temporal Attention Fusion Generative Adversarial Network (STA-GAN) to learn both spatial and temporal information to generate egocentric video sequences from the exocentric view.

Video Generation

Total Generate: Cycle in Cycle Generative Adversarial Networks for Generating Human Faces, Hands, Bodies, and Natural Scenes

1 code implementation21 Jun 2021 Hao Tang, Nicu Sebe

Both generators are mutually connected and trained in an end-to-end fashion and explicitly form three cycled subnets, i. e., one image generation cycle and two guidance generation cycles.

Image-to-Image Translation Translation

Controllable Person Image Synthesis with Spatially-Adaptive Warped Normalization

1 code implementation31 May 2021 Jichao Zhang, Aliaksandr Siarohin, Hao Tang, Jingjing Chen, Enver Sangineto, Wei Wang, Nicu Sebe

Controllable person image generation aims to produce realistic human images with desirable attributes (e. g., the given pose, cloth textures or hair style).

Image-to-Image Translation Pose Transfer +1

Transformer-Based Source-Free Domain Adaptation

2 code implementations28 May 2021 Guanglei Yang, Hao Tang, Zhun Zhong, Mingli Ding, Ling Shao, Nicu Sebe, Elisa Ricci

In this paper, we study the task of source-free domain adaptation (SFDA), where the source data are not available during target adaptation.

Domain Adaptation Knowledge Distillation

Cloth Interactive Transformer for Virtual Try-On

1 code implementation12 Apr 2021 Bin Ren, Hao Tang, Fanyang Meng, Runwei Ding, Ling Shao, Philip H. S. Torr, Nicu Sebe

2D image-based virtual try-on has attracted increased attention from the multimedia and computer vision communities.

Virtual Try-on

Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction

1 code implementation ICCV 2021 Guanglei Yang, Hao Tang, Mingli Ding, Nicu Sebe, Elisa Ricci

While convolutional neural networks have shown a tremendous impact on various computer vision tasks, they generally demonstrate limitations in explicitly modeling long-range dependencies due to the intrinsic locality of the convolution operation.

Depth Estimation

Adversarial Shape Learning for Building Extraction in VHR Remote Sensing Images

1 code implementation22 Feb 2021 Lei Ding, Hao Tang, Yahui Liu, Yilei Shi, Xiao Xiang Zhu, Lorenzo Bruzzone

To address this issue, we propose an adversarial shape learning network (ASLNet) to model the building shape patterns that improve the accuracy of building segmentation.

Semantically-Adaptive Upsampling for Layout-to-Image Translation

no code implementations1 Jan 2021 Hao Tang, Nicu Sebe

We propose the Semantically-Adaptive UpSampling (SA-UpSample), a general and highly effective upsampling method for the layout-to-image translation task.

Translation

Spatial Context-Aware Self-Attention Model For Multi-Organ Segmentation

no code implementations16 Dec 2020 Hao Tang, Xingwei Liu, Kun Han, Shanlin Sun, Narisu Bai, Xuming Chen, Huang Qian, Yong liu, Xiaohui Xie

State-of-the-art CNN segmentation models apply either 2D or 3D convolutions on input images, with pros and cons associated with each method: 2D convolution is fast, less memory-intensive but inadequate for extracting 3D contextual information from volumetric images, while the opposite is true for 3D convolution.

Semantic Segmentation

Towards Scale-Invariant Graph-related Problem Solving by Iterative Homogeneous GNNs

no code implementations NeurIPS 2020 Hao Tang, Zhiao Huang, Jiayuan Gu, Bao-liang Lu, Hao Su

Current graph neural networks (GNNs) lack generalizability with respect to scales (graph sizes, graph diameters, edge weights, etc..) when solving many graph analysis problems.

Towards Scale-Invariant Graph-related Problem Solving by Iterative Homogeneous Graph Neural Networks

no code implementations26 Oct 2020 Hao Tang, Zhiao Huang, Jiayuan Gu, Bao-liang Lu, Hao Su

Current graph neural networks (GNNs) lack generalizability with respect to scales (graph sizes, graph diameters, edge weights, etc..) when solving many graph analysis problems.

Dual Attention GANs for Semantic Image Synthesis

1 code implementation29 Aug 2020 Hao Tang, Song Bai, Nicu Sebe

We also propose two novel modules, i. e., position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention Module (CAM), to capture semantic structure attention in spatial and channel dimensions, respectively.

Image Generation

Bipartite Graph Reasoning GANs for Person Image Generation

1 code implementation10 Aug 2020 Hao Tang, Song Bai, Philip H. S. Torr, Nicu Sebe

We present a novel Bipartite Graph Reasoning GAN (BiGraphGAN) for the challenging person image generation task.

Pose Transfer

Dual In-painting Model for Unsupervised Gaze Correction and Animation in the Wild

1 code implementation9 Aug 2020 Jichao Zhang, Jingjing Chen, Hao Tang, Wei Wang, Yan Yan, Enver Sangineto, Nicu Sebe

In this paper we address the problem of unsupervised gaze correction in the wild, presenting a solution that works without the need for precise annotations of the gaze angle and the head pose.

Quantum Computation for Pricing the Collateralized Debt Obligations

no code implementations6 Aug 2020 Hao Tang, Anurag Pal, Lu-Feng Qiao, Tian-Yu Wang, Jun Gao, Xian-Min Jin

Collateralized debt obligation (CDO) has been one of the most commonly used structured financial products and is intensively studied in quantitative finance.

XingGAN for Person Image Generation

2 code implementations ECCV 2020 Hao Tang, Song Bai, Li Zhang, Philip H. S. Torr, Nicu Sebe

We propose a novel Generative Adversarial Network (XingGAN or CrossingGAN) for person image generation tasks, i. e., translating the pose of a given person to a desired one.

Pose Transfer

AMR Parsing with Latent Structural Information

no code implementations ACL 2020 Qiji Zhou, Yue Zhang, Donghong Ji, Hao Tang

Abstract Meaning Representations (AMRs) capture sentence-level semantics structural representations to broad-coverage natural sentences.

AMR Parsing

Relevant Region Prediction for Crowd Counting

no code implementations20 May 2020 Xinya Chen, Yanrui Bin, Changxin Gao, Nong Sang, Hao Tang

The module builds a fully connected directed graph between the regions of different density where each node (region) is represented by weighted global pooled feature, and GCN is learned to map this region graph to a set of relation-aware regions representations.

Crowd Counting

Vector-Quantized Autoregressive Predictive Coding

2 code implementations17 May 2020 Yu-An Chung, Hao Tang, James Glass

Autoregressive Predictive Coding (APC), as a self-supervised objective, has enjoyed success in learning representations from large amounts of unlabeled data, and the learned representations are rich for many downstream tasks.

Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis

1 code implementation31 Mar 2020 Hao Tang, Xiaojuan Qi, Dan Xu, Philip H. S. Torr, Nicu Sebe

To tackle the first challenge, we propose to use the edge as an intermediate representation which is further adopted to guide image generation via a proposed attention guided edge transfer module.

Image Generation

Exocentric to Egocentric Image Generation via Parallel Generative Adversarial Network

no code implementations8 Feb 2020 Gaowen Liu, Hao Tang, Hugo Latapie, Yan Yan

In this paper, we investigate exocentric (third-person) view to egocentric (first-person) view image generation.

Image Generation

Multi-Channel Attention Selection GANs for Guided Image-to-Image Translation

1 code implementation3 Feb 2020 Hao Tang, Dan Xu, Yan Yan, Jason J. Corso, Philip H. S. Torr, Nicu Sebe

In the first stage, the input image and the conditional semantic guidance are fed into a cycled semantic-guided generation network to produce initial coarse results.

Image-to-Image Translation Translation

Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

2 code implementations CVPR 2020 Hao Tang, Dan Xu, Yan Yan, Philip H. S. Torr, Nicu Sebe

To tackle this issue, in this work we consider learning the scene generation in a local context, and correspondingly design a local class-specific generative network with semantic maps as a guidance, which separately constructs and learns sub-generators concentrating on the generation of different classes, and is able to provide more scene details.

Image Generation Scene Generation

Asymmetric Generative Adversarial Networks for Image-to-Image Translation

1 code implementation14 Dec 2019 Hao Tang, Dan Xu, Hong Liu, Nicu Sebe

In this paper, we analyze the limitation of the existing symmetric GAN models in asymmetric translation tasks, and propose an AsymmetricGAN model with both translation and reconstruction generators of unequal sizes and different parameter-sharing strategy to adapt to the asymmetric need in both unsupervised and supervised image-to-image translation tasks.

Image-to-Image Translation Translation

Unified Generative Adversarial Networks for Controllable Image-to-Image Translation

1 code implementation12 Dec 2019 Hao Tang, Hong Liu, Nicu Sebe

The proposed model consists of a single generator and a discriminator taking a conditional image and the target controllable structure as input.

Facial Expression Translation Gesture-to-Gesture Translation +2

AttentionGAN: Unpaired Image-to-Image Translation using Attention-Guided Generative Adversarial Networks

2 code implementations27 Nov 2019 Hao Tang, Hong Liu, Dan Xu, Philip H. S. Torr, Nicu Sebe

State-of-the-art methods in image-to-image translation are capable of learning a mapping from a source domain to a target domain with unpaired image data.

Image-to-Image Translation Translation

Improving Semantic Segmentation of Aerial Images Using Patch-based Attention

no code implementations20 Nov 2019 Lei Ding, Hao Tang, Lorenzo Bruzzone

High-level features extracted from the late layers of a neural network are rich in semantic information, yet have blurred spatial details; low-level features extracted from the early layers of a network contain more pixel-level information, but are isolated and noisy.

Semantic Segmentation

Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation

1 code implementation2 Aug 2019 Hao Tang, Dan Xu, Gaowen Liu, Wei Wang, Nicu Sebe, Yan Yan

In this work, we propose a novel Cycle In Cycle Generative Adversarial Network (C$^2$GAN) for the task of keypoint-guided image generation.

Image Generation

NoduleNet: Decoupled False Positive Reductionfor Pulmonary Nodule Detection and Segmentation

1 code implementation25 Jul 2019 Hao Tang, Chupeng Zhang, Xiaohui Xie

Pulmonary nodule detection, false positive reduction and segmentation represent three of the most common tasks in the computeraided analysis of chest CT images.

Cascade Attention Guided Residue Learning GAN for Cross-Modal Translation

1 code implementation3 Jul 2019 Bin Duan, Wei Wang, Hao Tang, Hugo Latapie, Yan Yan

However, in machine learning, this cross-modal learning is a nontrivial task because different modalities have no homogeneous properties.

Translation

Relating Simple Sentence Representations in Deep Neural Networks and the Brain

1 code implementation ACL 2019 Sharmistha Jat, Hao Tang, Partha Talukdar, Tom Mitchell

To the best of our knowledge, this is the first work showing that the MEG brain recording when reading a word in a sentence can be used to distinguish earlier words in the sentence.

GazeCorrection:Self-Guided Eye Manipulation in the wild using Self-Supervised Generative Adversarial Networks

1 code implementation arXiv 2019 Jichao Zhang, Meng Sun, Jingjing Chen, Hao Tang, Yan Yan, Xueying Qin, Nicu Sebe

Gaze correction aims to redirect the person's gaze into the camera by manipulating the eye region, and it can be considered as a specific image resynthesis problem.

Resynthesis

Expression Conditional GAN for Facial Expression-to-Expression Translation

no code implementations14 May 2019 Hao Tang, Wei Wang, Songsong Wu, Xinya Chen, Dan Xu, Nicu Sebe, Yan Yan

In this paper, we focus on the facial expression translation task and propose a novel Expression Conditional GAN (ECGAN) which can learn the mapping from one image domain to another one based on an additional expression attribute.

Facial Expression Translation Translation

Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification

no code implementations11 May 2019 Achintya kr. Sarkar, Zheng-Hua Tan, Hao Tang, Suwon Shon, James Glass

There are a number of studies about extraction of bottleneck (BN) features from deep neural networks (DNNs)trained to discriminate speakers, pass-phrases and triphone states for improving the performance of text-dependent speaker verification (TD-SV).

automatic-speech-recognition Contrastive Learning +2

Structured Discriminative Tensor Dictionary Learning for Unsupervised Domain Adaptation

no code implementations11 May 2019 Songsong Wu, Yan Yan, Hao Tang, Jianjun Qian, Jian Zhang, Xiao-Yuan Jing

However, the number of labeled source samples are always limited due to expensive annotation cost in practice, making sub-optimal performance been observed.

Dictionary Learning Unsupervised Domain Adaptation

Joint Learning of Self-Representation and Indicator for Multi-View Image Clustering

no code implementations11 May 2019 Songsong Wu, Zhiqiang Lu, Hao Tang, Yan Yan, Songhao Zhu, Xiao-Yuan Jing, Zuoyong Li

Multi-view subspace clustering aims to divide a set of multisource data into several groups according to their underlying subspace structure.

Multi-view Subspace Clustering

Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation

3 code implementations CVPR 2019 Hao Tang, Dan Xu, Nicu Sebe, Yanzhi Wang, Jason J. Corso, Yan Yan

In this paper, we propose a novel approach named Multi-Channel Attention SelectionGAN (SelectionGAN) that makes it possible to generate images of natural scenes in arbitrary viewpoints, based on an image of the scene and a novel semantic map.

Bird View Synthesis Cross-View Image-to-Image Translation +1

VoiceID Loss: Speech Enhancement for Speaker Verification

no code implementations7 Apr 2019 Suwon Shon, Hao Tang, James Glass

In this paper, we propose VoiceID loss, a novel loss function for training a speech enhancement model to improve the robustness of speaker verification.

Speaker Verification Speech Enhancement

An Unsupervised Autoregressive Model for Speech Representation Learning

5 code implementations5 Apr 2019 Yu-An Chung, Wei-Ning Hsu, Hao Tang, James Glass

This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations.

General Classification Representation Learning +1

Attention-Guided Generative Adversarial Networks for Unsupervised Image-to-Image Translation

5 code implementations28 Mar 2019 Hao Tang, Dan Xu, Nicu Sebe, Yan Yan

To handle the limitation, in this paper we propose a novel Attention-Guided Generative Adversarial Network (AGGAN), which can detect the most discriminative semantic object and minimize changes of unwanted part for semantic manipulation problems without using extra data and models.

Translation Unsupervised Image-To-Image Translation

Automated pulmonary nodule detection using 3D deep convolutional neural networks

no code implementations23 Mar 2019 Hao Tang, Daniel R. Kim, Xiaohui Xie

Finally, we introduce a method to ensemble models from both stages via consensus to give the final predictions.

Computed Tomography (CT) Object Detection

Automatic Pulmonary Lobe Segmentation Using Deep Learning

1 code implementation23 Mar 2019 Hao Tang, Chupeng Zhang, Xiaohui Xie

To validate the robustness and performance of our proposed framework trained with a small number of training examples, we further tested our model on CT scans from an independent dataset.

Computed Tomography (CT)

An End-to-end Framework For Integrated Pulmonary Nodule Detection and False Positive Reduction

no code implementations23 Mar 2019 Hao Tang, Xingwei Liu, Xiaohui Xie

Most of the existing deep learning nodule detection systems are constructed in two steps: a) nodule candidates screening and b) false positive reduction, using two different models trained separately.

Computed Tomography (CT)

Improving Dense Crowd Counting Convolutional Neural Networks using Inverse k-Nearest Neighbor Maps and Multiscale Upsampling

1 code implementation31 Jan 2019 Greg Olmschenk, Hao Tang, Zhigang Zhu

Gatherings of thousands to millions of people frequently occur for an enormous variety of events, and automated counting of these high-density crowds is useful for safety, management, and measuring significance of an event.

Crowd Counting

Attribute-Guided Sketch Generation

1 code implementation28 Jan 2019 Hao Tang, Xinya Chen, Wei Wang, Dan Xu, Jason J. Corso, Nicu Sebe, Yan Yan

To this end, we propose a novel Attribute-Guided Sketch Generative Adversarial Network (ASGAN) which is an end-to-end framework and contains two pairs of generators and discriminators, one of which is used to generate faces with attributes while the other one is employed for image-to-sketch translation.

Translation

Fast and Robust Dynamic Hand Gesture Recognition via Key Frames Extraction and Feature Fusion

1 code implementation15 Jan 2019 Hao Tang, Hong Liu, Wei Xiao, Nicu Sebe

Gesture recognition is a hot topic in computer vision and pattern recognition, which plays a vitally important role in natural human-computer interface.

Hand Gesture Recognition Hand-Gesture Recognition

Dual Generator Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

1 code implementation14 Jan 2019 Hao Tang, Dan Xu, Wei Wang, Yan Yan, Nicu Sebe

State-of-the-art methods for image-to-image translation with Generative Adversarial Networks (GANs) can learn a mapping from one domain to another domain using unpaired image data.

Image-to-Image Translation Translation

Generalizing semi-supervised generative adversarial networks to regression using feature contrasting

no code implementations27 Nov 2018 Greg Olmschenk, Zhigang Zhu, Hao Tang

We first demonstrate the capabilities of semi-supervised regression GANs on a toy dataset which allows for a detailed understanding of how they operate in various circumstances.

Age Estimation Crowd Counting +1

On The Inductive Bias of Words in Acoustics-to-Word Models

no code implementations31 Oct 2018 Hao Tang, James Glass

In addition, we study three types of inductive bias, leveraging a pronunciation dictionary, word boundary annotations, and constraints on word durations.

Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model

1 code implementation12 Sep 2018 Suwon Shon, Hao Tang, James Glass

In this paper, we propose a Convolutional Neural Network (CNN) based speaker recognition model for extracting robust speaker embeddings.

Speaker Recognition Text-Independent Speaker Recognition

Deep Micro-Dictionary Learning and Coding Network

1 code implementation11 Sep 2018 Hao Tang, Heng Wei, Wei Xiao, Wei Wang, Dan Xu, Yan Yan, Nicu Sebe

In this paper, we propose a novel Deep Micro-Dictionary Learning and Coding Network (DDLCN).

Dictionary Learning

Integrated Server for Measurement-Device-Independent Quantum Key Distribution Network

no code implementations26 Aug 2018 Ci-Yu Wang, Jun Gao, Zhi-Qiang Jiao, Lu-Feng Qiao, Ruo-Jing Ren, Zhen Feng, Yuan Chen, Zeng-Quan Yan, Yao Wang, Hao Tang, Xian-Min Jin

Quantum key distribution (QKD), harnessing quantum physics and optoelectronics, may promise unconditionally secure information exchange in theory.

Quantum Physics

GestureGAN for Hand Gesture-to-Gesture Translation in the Wild

1 code implementation14 Aug 2018 Hao Tang, Wei Wang, Dan Xu, Yan Yan, Nicu Sebe

Therefore, this task requires a high-level understanding of the mapping between the input source gesture and the output target gesture.

Data Augmentation Gesture-to-Gesture Translation +1

On Training Recurrent Networks with Truncated Backpropagation Through Time in Speech Recognition

no code implementations9 Jul 2018 Hao Tang, James Glass

In this paper, we study recurrent networks' ability to learn long-term dependency in the context of speech recognition.

Speech Recognition

Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition

no code implementations13 Jun 2018 Wei-Ning Hsu, Hao Tang, James Glass

However, it is relatively inexpensive to collect large amounts of unlabeled data from domains that we want the models to generalize to.

automatic-speech-recognition Speech Recognition

A Study of Enhancement, Augmentation, and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition

no code implementations13 Jun 2018 Hao Tang, Wei-Ning Hsu, Francois Grondin, James Glass

Speech recognizers trained on close-talking speech do not generalize to distant speech and the word error rate degradation can be as large as 40% absolute.

Data Augmentation Distant Speech Recognition +2

Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation

1 code implementation CVPR 2018 Dan Xu, Wei Wang, Hao Tang, Hong Liu, Nicu Sebe, Elisa Ricci

Recent works have shown the benefit of integrating Conditional Random Fields (CRFs) models into deep architectures for improving pixel-level prediction tasks.

Monocular Depth Estimation

Sequence Prediction with Neural Segmental Models

no code implementations5 Sep 2017 Hao Tang

We explore end-to-end training for segmental models with various loss functions, and show how end-to-end training with marginal log loss can eliminate the need for detailed manual alignments.

General Classification

End-to-End Neural Segmental Models for Speech Recognition

no code implementations1 Aug 2017 Hao Tang, Liang Lu, Lingpeng Kong, Kevin Gimpel, Karen Livescu, Chris Dyer, Noah A. Smith, Steve Renals

Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time.

Speech Recognition

Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition

no code implementations5 Apr 2017 Shubham Toshniwal, Hao Tang, Liang Lu, Karen Livescu

We hypothesize that using intermediate representations as auxiliary supervision at lower levels of deep networks may be a good way of combining the advantages of end-to-end training and more traditional pipeline approaches.

Speech Recognition

End-to-End Training Approaches for Discriminative Segmental Models

no code implementations21 Oct 2016 Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu

Similarly to hybrid HMM-neural network models, segmental models of this class can be trained in two stages (frame classifier training followed by linear segmental model weight training), end to end (joint training of both frame classifier and linear weights), or with end-to-end fine-tuning after two-stage training.

Speech Recognition

Lexicon-Free Fingerspelling Recognition from Video: Data, Models, and Signer Adaptation

no code implementations26 Sep 2016 Taehwan Kim, Jonathan Keane, Weiran Wang, Hao Tang, Jason Riggle, Gregory Shakhnarovich, Diane Brentari, Karen Livescu

Recognizing fingerspelling is challenging for a number of reasons: It involves quick, small motions that are often highly coarticulated; it exhibits significant variation between signers; and there has been a dearth of continuous fingerspelling data collected.

Efficient Segmental Cascades for Speech Recognition

no code implementations2 Aug 2016 Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu

Discriminative segmental models offer a way to incorporate flexible feature functions into speech recognition.

Speech Recognition

Signer-independent Fingerspelling Recognition with Deep Neural Network Adaptation

no code implementations13 Feb 2016 Taehwan Kim, Weiran Wang, Hao Tang, Karen Livescu

Previous work has shown that it is possible to achieve almost 90% accuracies on fingerspelling recognition in a signer-dependent setting.

automatic-speech-recognition Speech Recognition

Discriminative Segmental Cascades for Feature-Rich Phone Recognition

no code implementations22 Jul 2015 Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu

A typical solution is to use approximate decoding, either by beam pruning in a single pass or by beam pruning to generate a lattice followed by a second pass.

Language Modelling Speech Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.