Search Results for author: Yiming Wang

Found 82 papers, 32 papers with code

Noise-injected Consistency Training and Entropy-constrained Pseudo Labeling for Semi-supervised Extractive Summarization

1 code implementation • COLING 2022 • Yiming Wang, Qianren Mao, Junnan Liu, Weifeng Jiang, Hongdong Zhu, JianXin Li

Labeling large amounts of extractive summarization data is often prohibitive expensive due to time, financial, and expertise constraints, which poses great challenges to incorporating summarization system in practical applications.

Extractive Summarization

Paper
Code

Extracting Topics with Simultaneous Word Co-occurrence and Semantic Correlation Graphs: Neural Topic Modeling for Short Texts

no code implementations • Findings (EMNLP) 2021 • Yiming Wang, Ximing Li, Xiaotang Zhou, Jihong Ouyang

Short text nowadays has become a more fashionable form of text data, e. g., Twitter posts, news titles, and product reviews.

Topic Models Variational Inference +1

Paper
Add Code

Light-weight Retinal Layer Segmentation with Global Reasoning

no code implementations • 25 Apr 2024 • Xiang He, Weiye Song, Yiming Wang, Fabio Poiesi, Ji Yi, Manishi Desai, Quanqing Xu, Kongzheng Yang, Yi Wan

Automatic retinal layer segmentation with medical images, such as optical coherence tomography (OCT) images, serves as an important tool for diagnosing ophthalmic diseases.

Paper
Add Code

Vocabulary-free Image Classification and Semantic Segmentation

1 code implementation • 16 Apr 2024 • Alessandro Conti, Enrico Fini, Massimiliano Mancini, Paolo Rota, Yiming Wang, Elisa Ricci

To address VIC, we propose Category Search from External Databases (CaSED), a training-free method that leverages a pre-trained vision-language model and an external database.

Classification Image Classification +4

Paper
Code

Test-Time Zero-Shot Temporal Action Localization

1 code implementation • 8 Apr 2024 • Benedetta Liberatori, Alessandro Conti, Paolo Rota, Yiming Wang, Elisa Ricci

To this aim, we introduce a novel method that performs Test-Time adaptation for Temporal Action Localization (T3AL).

Language Modelling Pseudo Label +3

Paper
Code

Harnessing Large Language Models for Training-free Video Anomaly Detection

no code implementations • 1 Apr 2024 • Luca Zanella, Willi Menapace, Massimiliano Mancini, Yiming Wang, Elisa Ricci

Video anomaly detection (VAD) aims to temporally locate abnormal events in a video.

Anomaly Detection Video Anomaly Detection

Paper
Add Code

Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation

no code implementations • 15 Mar 2024 • Francesco Taioli, Stefano Rosa, Alberto Castellini, Lorenzo Natale, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Yiming Wang

Moreover, we formally define the task of Instruction Error Detection and Localization, and establish an evaluation protocol on top of our benchmark dataset.

Navigate Vision and Language Navigation

Paper
Add Code

One for all: A novel Dual-space Co-training baseline for Large-scale Multi-View Clustering

no code implementations • 28 Jan 2024 • Zisen Kong, Zhiqiang Fu, Dongxia Chang, Yiming Wang, Yao Zhao

We jointly optimize the construction of the latent consistent anchor graph and the feature transformation to generate a discriminative anchor graph.

Clustering

Paper
Add Code

R-Judge: Benchmarking Safety Risk Awareness for LLM Agents

1 code implementation • 18 Jan 2024 • Tongxin Yuan, Zhiwei He, Lingzhong Dong, Yiming Wang, Ruijie Zhao, Tian Xia, Lizhen Xu, Binglin Zhou, Fangqi Li, Zhuosheng Zhang, Rui Wang, Gongshen Liu

We introduce R-Judge, a benchmark crafted to evaluate the proficiency of LLMs in judging and identifying safety risks given agent interaction records.

Benchmarking

Paper
Code

Novel class discovery meets foundation models for 3D semantic segmentation

no code implementations • 6 Dec 2023 • Luigi Riz, Cristiano Saltori, Yiming Wang, Elisa Ricci, Fabio Poiesi

Firstly, it introduces the novel task of NCD for point cloud semantic segmentation.

3D Semantic Segmentation Novel Class Discovery +2

Paper
Add Code

Diversified in-domain synthesis with efficient fine-tuning for few-shot classification

1 code implementation • 5 Dec 2023 • Victor G. Turrisi da Costa, Nicola Dall'Asen, Yiming Wang, Nicu Sebe, Elisa Ricci

Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class.

Few-Shot Image Classification Few-Shot Learning +3

Paper
Code

Collaborative Neural Painting

no code implementations • 4 Dec 2023 • Nicola Dall'Asen, Willi Menapace, Elia Peruzzo, Enver Sangineto, Yiming Wang, Elisa Ricci

The process of painting fosters creativity and rational planning.

Paper
Add Code

Geometrically-driven Aggregation for Zero-shot 3D Point Cloud Understanding

1 code implementation • 4 Dec 2023 • Guofeng Mei, Luigi Riz, Yiming Wang, Fabio Poiesi

Zero-shot 3D point cloud understanding can be achieved via 2D Vision-Language Models (VLMs).

Segmentation Semantic Segmentation

Paper
Code

RetroDiff: Retrosynthesis as Multi-stage Distribution Interpolation

no code implementations • 23 Nov 2023 • Yiming Wang, Yuxuan Song, Minkai Xu, Rui Wang, Hao Zhou, WeiYing Ma

Our key innovation is to develop a multi-stage diffusion process.

Graph Generation Retrosynthesis

Paper
Add Code

PrivateLoRA For Efficient Privacy Preserving LLM

no code implementations • 23 Nov 2023 • Yiming Wang, Yu Lin, Xiaodong Zeng, Guannan Zhang

To our knowledge, our proposed framework is the first efficient and privacy-preserving LLM solution in the literature.

Language Modelling Large Language Model +1

Paper
Add Code

Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents

1 code implementation • 20 Nov 2023 • Zhuosheng Zhang, Yao Yao, Aston Zhang, Xiangru Tang, Xinbei Ma, Zhiwei He, Yiming Wang, Mark Gerstein, Rui Wang, Gongshen Liu, Hai Zhao

Large language models (LLMs) have dramatically enhanced the field of language intelligence, as demonstrably evidenced by their formidable empirical performance across a spectrum of complex reasoning tasks.

307

Paper
Code

MultiLoRA: Democratizing LoRA for Better Multi-Task Learning

no code implementations • 20 Nov 2023 • Yiming Wang, Yu Lin, Xiaodong Zeng, Guannan Zhang

Further investigation into weight update matrices of MultiLoRA exhibits reduced dependency on top singular vectors and more democratic unitary transform contributions.

Multi-Task Learning Natural Language Understanding +1

Paper
Add Code

Multiscale Motion-Aware and Spatial-Temporal-Channel Contextual Coding Network for Learned Video Compression

no code implementations • 19 Oct 2023 • Yiming Wang, Qian Huang, Bin Tang, Huashan Sun, Xing Li

In addition, most approaches ignore the spatial and channel redundancy.

Motion Compensation Motion Estimation +4

Paper
Add Code

Delving into CLIP latent space for Video Anomaly Recognition

1 code implementation • 4 Oct 2023 • Luca Zanella, Benedetta Liberatori, Willi Menapace, Fabio Poiesi, Yiming Wang, Elisa Ricci

We tackle the complex problem of detecting and recognising anomalies in surveillance videos at the frame level, utilising only video-level supervision.

Anomaly Detection Multiple Instance Learning +1

Paper
Code

ResidualTransformer: Residual Low-Rank Learning with Weight-Sharing for Transformer Layers

no code implementations • 3 Oct 2023 • Yiming Wang, Jinyu Li

In this paper, we aim to reduce model size by reparameterizing model weights across Transformer encoder layers and assuming a special weight composition and structure.

speech-recognition Speech Recognition

Paper
Add Code

LCReg: Long-Tailed Image Classification with Latent Categories based Recognition

no code implementations • 13 Sep 2023 • Weide Liu, Zhonghua Wu, Yiming Wang, Henghui Ding, Fayao Liu, Jie Lin, Guosheng Lin

In this work, we tackle the challenging problem of long-tailed image recognition.

Data Augmentation Image Classification

Paper
Add Code

Survey on video anomaly detection in dynamic scenes with moving cameras

no code implementations • 14 Aug 2023 • Runyu Jiao, Yi Wan, Fabio Poiesi, Yiming Wang

The increasing popularity of compact and inexpensive cameras, e. g.~dash cameras, body cameras, and cameras equipped on robots, has sparked a growing interest in detecting anomalies within dynamic scenes recorded by moving cameras.

Anomaly Detection Video Anomaly Detection

Paper
Add Code

Attentive Multimodal Fusion for Optical and Scene Flow

1 code implementation • 28 Jul 2023 • Youjie Zhou, Guofeng Mei, Yiming Wang, Fabio Poiesi, Yi Wan

This paper presents an investigation into the estimation of optical and scene flow using RGBD information in scenarios where the RGB modality is affected by noise or captured in dark environments.

Paper
Code

Meta-Reasoning: Semantics-Symbol Deconstruction for Large Language Models

1 code implementation • 30 Jun 2023 • Yiming Wang, Zhuosheng Zhang, Pei Zhang, Baosong Yang, Rui Wang

Neural-symbolic methods have demonstrated efficiency in enhancing the reasoning abilities of large language models (LLMs).

Domain Generalization In-Context Learning +1

Paper
Code

Self-Enhancement Improves Text-Image Retrieval in Foundation Visual-Language Models

1 code implementation • 11 Jun 2023 • Yuguang Yang, Yiming Wang, Shupeng Geng, Runqi Wang, Yimi Wang, Sheng Wu, Baochang Zhang

The emergence of cross-modal foundation models has introduced numerous approaches grounded in text-image retrieval.

Attribute Image Retrieval +2

Paper
Code

Vocabulary-free Image Classification

1 code implementation • NeurIPS 2023 • Alessandro Conti, Enrico Fini, Massimiliano Mancini, Paolo Rota, Yiming Wang, Elisa Ricci

We thus formalize a novel task, termed as Vocabulary-free Image Classification (VIC), where we aim to assign to an input image a class that resides in an unconstrained language-induced semantic space, without the prerequisite of a known vocabulary.

Classification Image Classification +4

Paper
Code

Audio-Visual Dataset and Method for Anomaly Detection in Traffic Videos

1 code implementation • 24 May 2023 • Błażej Leporowski, Arian Bakhtiarnia, Nicole Bonnici, Adrian Muscat, Luca Zanella, Yiming Wang, Alexandros Iosifidis

We introduce the first audio-visual dataset for traffic anomaly detection taken from real-world scenes, called MAVAD, with a diverse range of weather and illumination conditions.

Anomaly Detection

Paper
Code

Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method

1 code implementation • 22 May 2023 • Yiming Wang, Zhuosheng Zhang, Rui Wang

Further, we propose a Summary Chain-of-Thought (SumCoT) technique to elicit LLMs to generate summaries step by step, which helps them integrate more fine-grained details of source documents into the final summaries that correlate with the human writing mindset.

Benchmarking Hallucination

Paper
Code

Positional Diffusion: Ordering Unordered Sets with Diffusion Probabilistic Models

1 code implementation • 20 Mar 2023 • Francesco Giuliari, Gianluca Scarpellini, Stuart James, Yiming Wang, Alessio Del Bue

We present Positional Diffusion, a plug-and-play graph formulation with Diffusion Probabilistic Models to address positional reasoning.

Sentence Sentence Ordering +1

Paper
Code

3DSGrasp: 3D Shape-Completion for Robotic Grasp

no code implementations • 2 Jan 2023 • Seyed S. Mohammadi, Nuno F. Duarte, Dimitris Dimou, Yiming Wang, Matteo Taiana, Pietro Morerio, Atabak Dehban, Plinio Moreno, Alexandre Bernardino, Alessio Del Bue, Jose Santos-Victor

However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the grasping action, leading to the generation of wrong or inaccurate grasp poses.

Robotic Grasping

Paper
Add Code

Label-Guided Knowledge Distillation for Continual Semantic Segmentation on 2D Images and 3D Point Clouds

1 code implementation • ICCV 2023 • Ze Yang, Ruibo Li, Evan Ling, Chi Zhang, Yiming Wang, Dezhao Huang, Keng Teck Ma, Minhoe Hur, Guosheng Lin

To address this issue, we propose a new label-guided knowledge distillation (LGKD) loss, where the old model output is expanded and transplanted (with the guidance of the ground truth label) to form a semantically appropriate class correspondence with the new model output.

Ranked #1 on Continual Semantic Segmentation on ScanNet

Continual Semantic Segmentation Knowledge Distillation +1

Paper
Code

Learning with linear mixed model for group recommendation systems

no code implementations • 17 Dec 2022 • Baode Gao, Guangpeng Zhan, Hanzhang Wang, Yiming Wang, Shengxin Zhu

Accurate prediction of users' responses to items is one of the main aims of many computational advising applications.

Recommendation Systems

Paper
Add Code

NeuS2: Fast Learning of Neural Implicit Surfaces for Multi-view Reconstruction

1 code implementation • ICCV 2023 • Yiming Wang, Qin Han, Marc Habermann, Kostas Daniilidis, Christian Theobalt, Lingjie Liu

Recent methods for neural surface representation and rendering, for example NeuS, have demonstrated the remarkably high-quality reconstruction of static scenes.

Surface Reconstruction

572

Paper
Code

Query Your Model with Definitions in FrameNet: An Effective Method for Frame Semantic Role Labeling

1 code implementation • 5 Dec 2022 • Ce Zheng, Yiming Wang, Baobao Chang

Such methods usually model role classification as naive multi-class classification and treat arguments individually, which neglects label semantics and interactions between arguments and thus hindering performance and generalization of models.

Classification Multi-class Classification +1

Paper
Code

DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

1 code implementation • 18 Nov 2022 • Yuhang Lai, Chengxi Li, Yiming Wang, Tianyi Zhang, Ruiqi Zhong, Luke Zettlemoyer, Scott Wen-tau Yih, Daniel Fried, Sida Wang, Tao Yu

We introduce DS-1000, a code generation benchmark with a thousand data science problems spanning seven Python libraries, such as NumPy and Pandas.

Code Generation Memorization

187

Paper
Code

Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition

no code implementations • 10 Nov 2022 • Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang

In this paper, we investigate SSL for streaming multi-talker speech recognition, which generates transcriptions of overlapping speakers in a streaming fashion.

Representation Learning Self-Supervised Learning +2

Paper
Add Code

Cross-view Graph Contrastive Representation Learning on Partially Aligned Multi-view Data

no code implementations • 8 Nov 2022 • Yiming Wang, Dongxia Chang, Zhiqiang Fu, Jie Wen, Yao Zhao

Multi-view representation learning has developed rapidly over the past decades and has been applied in many fields.

Contrastive Learning Representation Learning

Paper
Add Code

Leveraging commonsense for object localisation in partial scenes

no code implementations • 1 Nov 2022 • Francesco Giuliari, Geri Skenderi, Marco Cristani, Alessio Del Bue, Yiming Wang

With the proposed graph-based scene representation, we estimate the unknown position of the target object using a Graph Neural Network that implements a novel attentional message passing mechanism.

Object Position

Paper
Add Code

Oracle-guided Contrastive Clustering

no code implementations • 1 Nov 2022 • Mengdie Wang, Liyuan Shang, Suyun Zhao, Yiming Wang, Hong Chen, Cuiping Li, XiZhao Wang

Accordingly, the query results, guided by oracles with distinctive demands, may drive the OCC's clustering results in a desired orientation.

Active Learning Clustering +2

Paper
Add Code

ConfMix: Unsupervised Domain Adaptation for Object Detection via Confidence-based Mixing

1 code implementation • 20 Oct 2022 • Giulio Mattolin, Luca Zanella, Elisa Ricci, Yiming Wang

Unsupervised Domain Adaptation (UDA) for object detection aims to adapt a model trained on a source domain to detect instances from a new target domain for which annotations are not available.

Object Detection Unsupervised Domain Adaptation

Paper
Code

CTCBERT: Advancing Hidden-unit BERT with CTC Objectives

no code implementations • 16 Oct 2022 • Ruchao Fan, Yiming Wang, Yashesh Gaur, Jinyu Li

We examine CTCBERT on IDs from HuBERT Iter1, HuBERT Iter2, and PBERT.

Paper
Add Code

Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition

1 code implementation • 11 Oct 2022 • Alessandro Conti, Paolo Rota, Yiming Wang, Elisa Ricci

Automatically understanding emotions from visual data is a fundamental task for human behaviour understanding.

Cross-Domain Facial Expression Recognition Facial Expression Recognition (FER) +2

Paper
Code

Neural Novel Actor: Learning a Generalized Animatable Neural Representation for Human Actors

no code implementations • 25 Aug 2022 • Yiming Wang, Qingzhe Gao, Libin Liu, Lingjie Liu, Christian Theobalt, Baoquan Chen

The learned representation can be used to synthesize novel view images of an arbitrary person from a sparse set of cameras, and further animate them with the user's pose control.

Attribute

Paper
Add Code

PI-Trans: Parallel-ConvMLP and Implicit-Transformation Based GAN for Cross-View Image Translation

1 code implementation • 9 Jul 2022 • Bin Ren, Hao Tang, Yiming Wang, Xia Li, Wei Wang, Nicu Sebe

For semantic-guided cross-view image translation, it is crucial to learn where to sample pixels from the source view image and where to reallocate them guided by the target view semantic map, especially when there is little overlap or drastic view difference between the source and target images.

Generative Adversarial Network

Paper
Code

Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training

no code implementations • 21 Jun 2022 • Chengyi Wang, Yiming Wang, Yu Wu, Sanyuan Chen, Jinyu Li, Shujie Liu, Furu Wei

Recently, masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Long-tailed Recognition by Learning from Latent Categories

no code implementations • 2 Jun 2022 • Weide Liu, Zhonghua Wu, Yiming Wang, Henghui Ding, Fayao Liu, Jie Lin, Guosheng Lin

Previous long-tailed recognition methods commonly focus on the data augmentation or re-balancing strategy of the tail classes to give more attention to tail classes during the model training.

Ranked #9 on Long-tail Learning on CIFAR-10-LT (ρ=10)

Data Augmentation Long-tail Learning

Paper
Add Code

Spatial Commonsense Graph for Object Localisation in Partial Scenes

1 code implementation • CVPR 2022 • Francesco Giuliari, Geri Skenderi, Marco Cristani, Yiming Wang, Alessio Del Bue

The SCG is used to estimate the unknown position of the target object in two steps: first, we feed the SCG into a novel Proximity Prediction Network, a graph neural network that uses attention to perform distance prediction between the node representing the target object and the nodes representing the observed objects in the SCG; second, we propose a Localisation Module based on circular intersection to estimate the object position using all the predicted pairwise distances in order to be independent of any reference system.

Object Position

Paper
Code

Behavior Recognition Based on the Integration of Multigranular Motion Features

no code implementations • 7 Mar 2022 • Lizong Zhang, Yiming Wang, Bei Hui, Xiujian Zhang, Sijuan Liu, Shuxin Feng

Specifically, behavior recognition may even rely more on the modeling of temporal information containing short-range and long-range motions; this contrasts with computer vision tasks involving images that focus on the understanding of spatial information.

Action Recognition

Paper
Add Code

ACTIVE:Augmentation-Free Graph Contrastive Learning for Partial Multi-View Clustering

no code implementations • 1 Mar 2022 • Yiming Wang, Dongxia Chang, Zhiqiang Fu, Jie Wen, Yao Zhao

In this paper, we propose an augmentation-free graph contrastive learning framework, namely ACTIVE, to solve the problem of partial multi-view clustering.

Clustering Contrastive Learning +1

Paper
Add Code

Graph-based Generative Face Anonymisation with Pose Preservation

1 code implementation • 10 Dec 2021 • Nicola Dall'Asen, Yiming Wang, Hao Tang, Luca Zanella, Elisa Ricci

With the goal to maintain the geometric attributes of the source face, i. e., the facial pose and expression, and to promote more natural face generation, we propose to exploit a Bipartite Graph to explicitly model the relations between the facial landmarks of the source identity and the ones of the condition identity through a deep model.

Face Detection Face Generation

Paper
Code

Incomplete Multi-view Clustering via Cross-view Relation Transfer

no code implementations • 1 Dec 2021 • Yiming Wang, Dongxia Chang, Zhiqiang Fu, Yao Zhao

In this paper, we consider the problem of multi-view clustering on incomplete views.

Clustering Incomplete multi-view clustering +1

Paper
Add Code

Loop closure detection using local 3D deep descriptors

1 code implementation • 31 Oct 2021 • Youjie Zhou, Yiming Wang, Fabio Poiesi, Qi Qin, Yi Wan

We compare our L3D-based loop closure approach with recent approaches on LiDAR data and achieve state-of-the-art loop closure detection accuracy.

Loop Closure Detection

Paper
Code

Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction

no code implementations • 28 Oct 2021 • Heming Wang, Yao Qian, Xiaofei Wang, Yiming Wang, Chengyi Wang, Shujie Liu, Takuya Yoshioka, Jinyu Li, DeLiang Wang

The reconstruction module is used for auxiliary learning to improve the noise robustness of the learned representation and thus is not required during inference.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +8

Paper
Add Code

Wav2vec-Switch: Contrastive Learning from Original-noisy Speech Pairs for Robust Speech Recognition

no code implementations • 11 Oct 2021 • Yiming Wang, Jinyu Li, Heming Wang, Yao Qian, Chengyi Wang, Yu Wu

In this paper we propose wav2vec-Switch, a method to encode noise robustness into contextualized representations of speech via contrastive learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

Paper
Add Code

POINTVIEW-GCN: 3D SHAPE CLASSIFICATION WITH MULTI-VIEW POINT CLOUDS

1 code implementation • IEEE International Conference on Image Processing 2021 • Seyed Saber Mohammadi, Yiming Wang, Alessio Del Bue

We address 3D shape classification with partial point cloud inputs captured from multiple viewpoints around the object.

Ranked #2 on 3D Point Cloud Classification on ModelNet40

3D Point Cloud Classification 3D Shape Classification +2

Paper
Code

Double Low-Rank Representation With Projection Distance Penalty for Clustering

no code implementations • CVPR 2021 • Zhiqiang Fu, Yao Zhao, Dongxia Chang, Xingxing Zhang, Yiming Wang

This paper presents a novel, simple yet robust self-representation method, i. e., Double Low-Rank Representation with Projection Distance penalty (DLRRPD) for clustering.

Clustering

Paper
Add Code

Consistent Multiple Graph Embedding for Multi-View Clustering

no code implementations • 11 May 2021 • Yiming Wang, Dongxia Chang, Zhiqiang Fu, Yao Zhao

Specifically, a multiple graph auto-encoder(M-GAE) is designed to flexibly encode the complementary information of multi-view data using a multi-graph attention fusion encoder.

Clustering Graph Attention +1

Paper
Add Code

Seeing All From a Few: Nodes Selection Using Graph Pooling for Graph Clustering

no code implementations • 30 Apr 2021 • Yiming Wang, Dongxia Chang, Zhiqian Fu, Yao Zhao

This paper is the first attempt to employ graph pooling technique for node clustering and we propose a novel dual graph embedding network (DGEN), which is designed as a two-step graph encoder connected by a graph pooling layer to learn the graph embedding.

Clustering Graph Clustering +2

Paper
Add Code

Auto-weighted low-rank representation for clustering

no code implementations • 26 Apr 2021 • Zhiqiang Fu, Yao Zhao, Dongxia Chang, Xingxing Zhang, Yiming Wang

In this paper, a novel unsupervised low-rank representation model, i. e., Auto-weighted Low-Rank Representation (ALRR), is proposed to construct a more favorable similarity graph (SG) for clustering.

Clustering Representation Learning

Paper
Add Code

Wake Word Detection with Streaming Transformers

no code implementations • 8 Feb 2021 • Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur

Modern wake word detection systems usually rely on neural networks for acoustic modeling.

Paper
Add Code

From Point to Space: 3D Moving Human Pose Estimation Using Commodity WiFi

no code implementations • 28 Dec 2020 • Yiming Wang, Lingchao Guo, Zhaoming Lu, Xiangming Wen, Shuang Zhou, Wanyu Meng

To reconstruct 3D poses of people who move throughout the space rather than a fixed point, we fuse the amplitude and phase into Channel State Information (CSI) images which can provide both pose and position information.

3D Pose Estimation Position

Paper
Add Code

Subject-independent Human Pose Image Construction with Commodity Wi-Fi

no code implementations • 22 Dec 2020 • Shuang Zhou, Lingchao Guo, Zhaoming Lu, Xiangming Wen, Wei Zheng, Yiming Wang

Existing papers achieve good results when constructing the images of subjects who are in the prior training samples.

Paper
Add Code

Where to Explore Next? ExHistCNN for History-aware Autonomous 3D Exploration

1 code implementation • ECCV 2020 • Yiming Wang, Alessio Del Bue

In this work we address the problem of autonomous 3D exploration of an unknown indoor environment using a depth camera.

3D Reconstruction

Paper
Code

Single Image Human Proxemics Estimation for Visual Social Distancing

1 code implementation • 3 Nov 2020 • Maya Aghaei, Matteo Bustreo, Yiming Wang, Gianluca Bailo, Pietro Morerio, Alessio Del Bue

In this work, we address the problem of estimating the so-called "Social Distancing" given a single uncalibrated image in unconstrained scenarios.

136

Paper
Code

POMP: Pomcp-based Online Motion Planning for active visual search in indoor environments

no code implementations • 17 Sep 2020 • Yiming Wang, Francesco Giuliari, Riccardo Berra, Alberto Castellini, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Francesco Setti

Our POMP method uses as input the current pose of an agent (e. g. a robot) and a RGB-D frame.

Motion Planning object-detection +1

Paper
Add Code

PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR

1 code implementation • 20 May 2020 • Yiwen Shao, Yiming Wang, Daniel Povey, Sanjeev Khudanpur

We present PyChain, a fully parallelized PyTorch implementation of end-to-end lattice-free maximum mutual information (LF-MMI) training for the so-called \emph{chain models} in the Kaldi automatic speech recognition (ASR) toolkit.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

214

Paper
Code

Wake Word Detection with Alignment-Free Lattice-Free MMI

1 code implementation • 17 May 2020 • Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur

Always-on spoken language interfaces, e. g. personal digital assistants, rely on a wake word to start processing spoken input.

13,716

Paper
Code

Espresso: A Fast End-to-end Neural Speech Recognition Toolkit

1 code implementation • 18 Sep 2019 • Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur

We present Espresso, an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq.

Ranked #1 on Speech Recognition on Hub5'00 CallHome

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

941

Paper
Code

Robust Document Representations for Cross-Lingual Information Retrieval in Low-Resource Settings

no code implementations • WS 2019 • Mahsa Yarmohammadi, Xutai Ma, Sorami Hisamoto, Muhammad Rahman, Yiming Wang, Hainan Xu, Daniel Povey, Philipp Koehn, Kevin Duh

Cross-Lingual Information Retrieval Retrieval

Paper
Add Code

End-to-end Anchored Speech Recognition

no code implementations • 6 Feb 2019 • Yiming Wang, Xing Fan, I-Fan Chen, Yuzong Liu, Tongfei Chen, Björn Hoffmeister

The anchored segment refers to the wake-up word part of an audio stream, which contains valuable speaker information that can be used to suppress interfering speech and background noise.

Multi-Task Learning speech-recognition +1

Paper
Add Code

An Empirical Study of Machine Translation for the Shared Task of WMT18

no code implementations • WS 2018 • Chao Bei, Hao Zong, Yiming Wang, Baoyong Fan, Shiqi Li, Conghu Yuan

The submitted system focus on data clearing and techniques to build a competitive model for this task.

Chinese Word Segmentation Language Modelling +2

Paper
Add Code

Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks

1 code implementation • Interspeech 2018 2018 • Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi, Sanjeev Khudanpur

Time Delay Neural Networks (TDNNs), also known as onedimensional Convolutional Neural Networks (1-d CNNs), are an efficient and well-performing neural network architecture for speech recognition.

speech-recognition Speech Recognition

143

Paper
Code

Neural Network Language Modeling with Letter-based Features and Importance Sampling

no code implementations • ICASSP 2018 • Hainan Xu, Ke Li, Yiming Wang, Jian Wang, Shiyin Kang, Xie Chen, Daniel Povey, Sanjeev Khudanpur

In this paper we describe an extension of the Kaldi software toolkit to support neural-based language modeling, intended for use in automatic speech recognition (ASR) and related tasks.

Ranked #36 on Speech Recognition on LibriSpeech test-other (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A GPU-based WFST Decoder with Exact Lattice Generation

no code implementations • 9 Apr 2018 • Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Daniel Povey, Sanjeev Khudanpur

We describe initial work on an extension of the Kaldi toolkit that supports weighted finite-state transducer (WFST) decoding on Graphics Processing Units (GPUs).

Scheduling

Paper
Add Code

Purely sequence-trained neural networks for ASR based on lattice-free MMI

no code implementations • INTERSPEECH 2016 2016 • Daniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahrmani, Vimal Manohar, Xingyu Na, Yiming Wang, Sanjeev Khudanpur

Models trained with LFMMI provide a relative word error rate reduction of ∼11. 5%, over those trained with cross-entropy objective function, and ∼8%, over those trained with cross-entropy and sMBR objective functions.

Ranked #4 on Speech Recognition on WSJ eval92