An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

no code implementations13 Feb 2024 Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, JiaMing Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen

We found that delicate designs are not necessary, while an embarrassingly simple composition of off-the-shelf speech encoder, LLM, and the only trainable linear projector is competent for the ASR task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

2 code implementations23 Dec 2023 Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, Jinchao Li, Zhifu Gao, Shiliang Zhang, Xie Chen

To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.

Self-Supervised Learning Sentiment Analysis +1

Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token-based ASR

1 code implementation8 Nov 2023 Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Yukun Ma, Hai Yu, Jiaqing Liu, Chong Zhang

We find that applying the conventional cross-entropy loss on input speech tokens does not consistently improve the ASR performance over the Loss Masking approach.

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

1 code implementation1 Oct 2023 Shiyu Xuan, Qingpei Guo, Ming Yang, Shiliang Zhang

Specifically, we present a new method for constructing the instruction tuning dataset at a low cost by leveraging annotations in existing datasets.

Referring Expression

Exploring RWKV for Memory Efficient and Low Latency Streaming ASR

no code implementations26 Sep 2023 Keyu An, Shiliang Zhang

Recently, self-attention-based transformers and conformers have been introduced as alternatives to RNNs for ASR acoustic modeling.


Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition

no code implementations19 Sep 2023 Ziyang Ma, Wen Wu, Zhisheng Zheng, Yiwei Guo, Qian Chen, Shiliang Zhang, Xie Chen

In this paper, we explored how to boost speech emotion recognition (SER) with the state-of-the-art speech pre-trained model (PTM), data2vec, text generation technique, GPT-4, and speech synthesis technique, Azure TTS.

Data Augmentation Language Modelling +5

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer

no code implementations14 Sep 2023 Peng Wang, Yifan Yang, Zheng Liang, Tian Tan, Shiliang Zhang, Xie Chen

In spite of the excellent strides made by end-to-end (E2E) models in speech recognition in recent years, named entity recognition is still challenging but critical for semantic understanding.

Language Modelling named-entity-recognition +3

FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec

1 code implementation14 Sep 2023 Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng

We also demonstrate that the pre-trained models are suitable for downstream tasks, including automatic speech recognition and personalized text-to-speech synthesis.

Automatic Speech Recognition speech-recognition +3

MixBCT: Towards Self-Adapting Backward-Compatible Training

1 code implementation14 Aug 2023 Yu Liang, Shiliang Zhang, YaoWei Wang, Sheng Xiao, Kenli Li, Xiaoyu Wang

As a solution, backward-compatible training can be employed to avoid the necessity of updating old retrieval datasets.

Face Recognition Image Retrieval +1

SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability

2 code implementations7 Aug 2023 Xian Shi, Yexin Yang, Zerui Li, Yanni Chen, Zhifu Gao, Shiliang Zhang

It possesses the advantages of AED-based model's accuracy, NAR model's efficiency, and explicit customization capacity of superior performance.

Adaptive robust tracking control with active learning for linear systems with ellipsoidal bounded uncertainties

no code implementations7 Aug 2023 Xuehui Ma, Shiliang Zhang, Yushuai Li, Fucai Qian, TingWen Huang

This paper is concerned with the robust tracking control of linear uncertain systems, whose unknown system parameters and disturbances are bounded within ellipsoidal sets.

Active Learning

Self-Distillation Network with Ensemble Prototypes: Learning Robust Speaker Representations without Supervision

1 code implementation5 Aug 2023 Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang

It assigns representation of augmented views of utterances to the same prototypes as the representation of the original view, thereby enabling effective knowledge transfer between the views.

Representation Learning Speaker Verification +1

BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

no code implementations23 May 2023 Yuhao Liang, Fan Yu, Yangze Li, Pengcheng Guo, Shiliang Zhang, Qian Chen, Lei Xie

The recently proposed serialized output training (SOT) simplifies multi-talker automatic speech recognition (ASR) by generating speaker transcriptions separated by a special token.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

CASA-ASR: Context-Aware Speaker-Attributed ASR

no code implementations21 May 2023 Mohan Shi, Zhihao Du, Qian Chen, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang, Li-Rong Dai

In addition, a two-pass decoding strategy is further proposed to fully leverage the contextual modeling ability resulting in a better recognition performance.

Automatic Speech Recognition speech-recognition +1

BAT: Boundary aware transducer for memory-efficient and low-latency ASR

1 code implementation19 May 2023 Keyu An, Xian Shi, Shiliang Zhang

Recently, recurrent neural network transducer (RNN-T) gains increasing popularity due to its natural streaming capability as well as superior performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR)

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

1 code implementation18 May 2023 Zhifu Gao, Zerui Li, JiaMing Wang, Haoneng Luo, Xian Shi, Mengzhe Chen, Yabin Li, Lingyun Zuo, Zhihao Du, Zhangyu Xiao, Shiliang Zhang

FunASR offers models trained on large-scale industrial corpora and the ability to deploy them in applications.

 Ranked #1 on Speech Recognition on WenetSpeech (using extra training data)

Action Detection Activity Detection +2

Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System

no code implementations18 May 2023 Xian Shi, Haoneng Luo, Zhifu Gao, Shiliang Zhang, Zhijie Yan

Estimating confidence scores for recognition results is a classic task in ASR field and of vital importance for kinds of downstream tasks and training strategies.

speech-recognition Speech Recognition

TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker Diarization

1 code implementation8 Mar 2023 JiaMing Wang, Zhihao Du, Shiliang Zhang

Recently, end-to-end neural diarization (EEND) is introduced and achieves promising results in speaker-overlapped scenarios.

speaker-diarization Speaker Diarization +1

Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model

1 code implementation29 Jan 2023 Xian Shi, Yanni Chen, Shiliang Zhang, Zhijie Yan

Conventional ASR systems use frame-level phoneme posterior to conduct force-alignment~(FA) and provide timestamps, while end-to-end ASR systems especially AED based ones are short of such ability.

Evolved Part Masking for Self-Supervised Learning

no code implementations CVPR 2023 Zhanzhou Feng, Shiliang Zhang

The accuracy of partitioned parts is on par with the capability of the pre-trained model, leading to evolved mask patterns at different training stages.

Image Classification Object +4

3D Human Mesh Recovery with Sequentially Global Rotation Estimation

1 code implementation ICCV 2023 Dongkai Wang, Shiliang Zhang

This pipeline needs to transform each relative rotation matrix into a global rotation matrix to articulate the canonical mesh, and suffers from accumulated errors along the kinematics chain.

Human Mesh Recovery

MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition

1 code implementation29 Nov 2022 Xiaohuan Zhou, JiaMing Wang, Zeyu Cui, Shiliang Zhang, Zhijie Yan, Jingren Zhou, Chang Zhou

Therefore, we propose to introduce the phoneme modality into pre-training, which can help capture modality-invariant information between Mandarin speech and text.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Deep Active Learning for Computer Vision: Past and Future

no code implementations27 Nov 2022 Rinyoichi Takezoe, Xu Liu, Shunan Mao, Marco Tianyu Chen, Zhanpeng Feng, Shiliang Zhang, Xiaoyu Wang

As an important data selection schema, active learning emerges as the essential component when iterating an Artificial Intelligence (AI) model.

Active Learning

ParCNetV2: Oversized Kernel with Enhanced Attention

1 code implementation ICCV 2023 Ruihan Xu, Haokui Zhang, Wenze Hu, Shiliang Zhang, Xiaoyu Wang

Specifically, we propose a new convolutional neural network, ParCNetV2, that extends position-aware circular convolution (ParCNet) with oversized convolutions and bifurcate gate units to enhance attention.

ALBench: A Framework for Evaluating Active Learning in Object Detection

1 code implementation27 Jul 2022 Zhanpeng Feng, Shiliang Zhang, Rinyoichi Takezoe, Wenze Hu, Manmohan Chandraker, Li-Jia Li, Vijay K. Narayanan, Xiaoyu Wang

To facilitate the research in this field, this paper contributes an active learning benchmark framework named as ALBench for evaluating active learning in object detection.

Active Learning Image Classification +4

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition

2 code implementations16 Jun 2022 Zhifu Gao, Shiliang Zhang, Ian McLoughlin, Zhijie Yan

However, due to an independence assumption within the output tokens, performance of single-step NAR is inferior to that of AR models, especially with a large-scale corpus.

Language Modelling speech-recognition +1

A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings

no code implementations31 Mar 2022 Fan Yu, Zhihao Du, Shiliang Zhang, Yuxiao Lin, Lei Xie

Therefore, we propose the second approach, WD-SOT, to address alignment errors by introducing a word-level diarization model, which can get rid of such timestamp alignment dependency.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios

1 code implementation18 Mar 2022 Zhihao Du, Shiliang Zhang, Siqi Zheng, Zhijie Yan

Through this formulation, we propose the speaker embedding-aware neural diarization (SEND) framework, where a speech encoder, a speaker encoder, two similarity scorers, and a post-processing network are jointly optimized to predict the encoded labels according to the similarities between speech features and speaker embeddings.

Action Detection Activity Detection +2

Extended vehicle energy dataset (eVED): an enhanced large-scale dataset for deep learning on vehicle trip energy consumption

2 code implementations16 Mar 2022 Shiliang Zhang, Dyako Fatih, Fahmi Abdulqadir, Tobias Schwarz, Xuehui Ma

Compared with its original version, the extended VED (eVED) dataset is enhanced with accurate vehicle trip GPS coordinates, serving as a basis to associate the VED trip records with external information, e. g., road speed limit and intersections, from accessible map services to accumulate attributes that is essential in analyzing vehicle energy consumption.

Contextualize differential privacy in image database: a lightweight image differential privacy approach based on principle component analysis inverse

no code implementations16 Feb 2022 Shiliang Zhang, Xuehui Ma, Hui Cao, Tengyuan Zhao, Yajie Yu, Zhuzhu Wang

To this end, we design a lightweight approach dedicating to privatizing image database as a whole and preserving the statistical semantics of the image database to an adjustable level, while making individual images' contribution to such statistics indistinguishable.


ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech

no code implementations16 Feb 2022 Yi Ren, Ming Lei, Zhiying Huang, Shiliang Zhang, Qian Chen, Zhijie Yan, Zhou Zhao

Specifically, we first introduce a word-level prosody encoder, which quantizes the low-frequency band of the speech and compresses prosody attributes in the latent prosody vector (LPV).

Contextual Instance Decoupling for Robust Multi-Person Pose Estimation

1 code implementation CVPR 2022 Dongkai Wang, Shiliang Zhang

Instead of relying on person bounding boxes to spatially differentiate persons, CID decouples persons in an image into multiple instance-aware feature maps.

Multi-Person Pose Estimation

Robust Pose Estimation in Crowded Scenes with Direct Pose-Level Inference

1 code implementation NeurIPS 2021 Dongkai Wang, Shiliang Zhang, Gang Hua

Instead of inferring individual keypoints, the Pose-level Inference Network (PINet) directly infers the complete pose cues for a person from his/her visible body parts.

Multi-Person Pose Estimation

Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information

2 code implementations28 Nov 2021 Zhihao Du, Shiliang Zhang, Siqi Zheng, Weilong Huang, Ming Lei

In this paper, we reformulate this task as a single-label prediction problem by encoding the multi-speaker labels with power set.

Action Detection Activity Detection +2

An Energy Consumption Model for Electrical Vehicle Networks via Extended Federated-learning

no code implementations13 Nov 2021 Shiliang Zhang

The two components collaborate to enhance learning robustness against data heterogeneities in networks.

Anomaly Detection Federated Learning

MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction

1 code implementation Findings (ACL) 2022 Linhan Zhang, Qian Chen, Wen Wang, Chong Deng, Shiliang Zhang, Bing Li, Wei Wang, Xin Cao

In this work, we propose a novel unsupervised embedding-based KPE approach, Masked Document Embedding Rank (MDERank), to address this problem by leveraging a mask strategy and ranking candidates by the similarity between embeddings of the source document and the masked document.

Contrastive Learning Document Embedding +4

BeamTransformer: Microphone Array-based Overlapping Speech Detection

no code implementations9 Sep 2021 Siqi Zheng, Shiliang Zhang, Weilong Huang, Qian Chen, Hongbin Suo, Ming Lei, Jinwei Feng, Zhijie Yan

We propose BeamTransformer, an efficient architecture to leverage beamformer's edge in spatial filtering and transformer's capability in context sequence modeling.

MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking

2 code implementations22 Jul 2021 Xiao Wang, Xiujun Shu, Shiliang Zhang, Bo Jiang, YaoWei Wang, Yonghong Tian, Feng Wu

The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.

Rgb-T Tracking

Large-Scale Spatio-Temporal Person Re-identification: Algorithms and Benchmark

2 code implementations31 May 2021 Xiujun Shu, Xiao Wang, Xianghao Zang, Shiliang Zhang, Yuanqi Chen, Ge Li, Qi Tian

We also verified that models pre-trained on LaST can generalize well on existing datasets with short-term and cloth-changing scenarios.

Person Re-Identification

Graph Consistency Based Mean-Teaching for Unsupervised Domain Adaptive Person Re-Identification

1 code implementation11 May 2021 Xiaobin Liu, Shiliang Zhang

Specifically, given unlabeled training images, we apply teacher networks to extract corresponding features and further construct a teacher graph for each teacher network to describe the similarity relationships among training images.

Contrastive Learning Domain Adaptive Person Re-Identification +2

AAformer: Auto-Aligned Transformer for Person Re-Identification

no code implementations2 Apr 2021 Kuan Zhu, Haiyun Guo, Shiliang Zhang, YaoWei Wang, Gaopan Huang, Honglin Qiao, Jing Liu, Jinqiao Wang, Ming Tang

In this paper, we introduce an alignment scheme in Transformer architecture for the first time and propose the Auto-Aligned Transformer (AAformer) to automatically locate both the human parts and non-human ones at patch-level.

Human Parsing Image Classification +3

Intra-Inter Camera Similarity for Unsupervised Person Re-Identification

1 code implementation CVPR 2021 Shiyu Xuan, Shiliang Zhang

The second stage considers the classification scores of each sample on different cameras as a new feature vector.

 Ranked #1 on Person Re-Identification on SYSU-30k (using extra training data)

Pseudo Label Transfer Learning +1

Viewpoint and Scale Consistency Reinforcement for UAV Vehicle Re-Identification

1 code implementation IJCV 2021 Shangzhi Teng, Shiliang Zhang, Qingming Huang, Nicu Sebe

Moreover, our method also achieves competitive performance compared with recent works on existing vehicle ReID datasets including VehicleID, VeRi-776 and VERI-Wild.

Vehicle Re-Identification

Domain Adaptive Person Re-Identification via Coupling Optimization

1 code implementation6 Nov 2020 Xiaobin Liu, Shiliang Zhang

Extensive experiments on three large-scale datasets, i. e., Market-1501, DukeMTMC-reID, and MSMT17, show that our coupling optimization outperforms state-of-the-art methods by a large margin.

Domain Adaptive Person Re-Identification Transfer Learning +1

Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive Person Re-Identification

no code implementations ECCV 2020 Jianing Li, Shiliang Zhang

This paper tackles this challenge through jointly enforcing visual and temporal consistency in the combination of a local one-hot classification and a global multi-class classification.

Classification Domain Adaptive Person Re-Identification +3

Distillation Guided Residual Learning for Binary Convolutional Neural Networks

1 code implementation10 Jul 2020 Jianming Ye, Shiliang Zhang, Jingdong Wang

We observe that, this performance gap leads to substantial residuals between intermediate feature maps of BCNN and FCNN.

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition

1 code implementation21 May 2020 Shiliang Zhang, Zhifu Gao, Haoneng Luo, Ming Lei, Jie Gao, Zhijie Yan, Lei Xie

Recently, streaming end-to-end automatic speech recognition (E2E-ASR) has gained more and more attention.

Sound Audio and Speech Processing

Simplified Self-Attention for Transformer-based End-to-End Speech Recognition

no code implementations21 May 2020 Haoneng Luo, Shiliang Zhang, Ming Lei, Lei Xie

Transformer models have been introduced into end-to-end speech recognition with state-of-the-art performance on various tasks owing to their superiority in modeling long-term dependencies.

speech-recognition Speech Recognition

Robust Partial Matching for Person Search in the Wild

no code implementations CVPR 2020 Yingji Zhong, Xiaoyu Wang, Shiliang Zhang

This paper also contributes a Large-Scale dataset for Person Search in the wild (LSPS), which is by far the largest and the most challenging dataset for person search.

Human Detection Person Search +1

Neural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System

no code implementations3 Oct 2019 Kai Fan, Jiayi Wang, Bo Li, Shiliang Zhang, Boxing Chen, Niyu Ge, Zhijie Yan

The performances of automatic speech recognition (ASR) systems are usually evaluated by the metric word error rate (WER) when the manually transcribed data are provided, which are, however, expensively available in the real scenario.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Global-Local Temporal Representations For Video Person Re-Identification

no code implementations ICCV 2019 Jianing Li, Jingdong Wang, Qi Tian, Wen Gao, Shiliang Zhang

The long-term relations are captured by a temporal self-attention model to alleviate the occlusions and noises in video sequences.

Metric Learning Re-Ranking +1

Resolution-invariant Person Re-Identification

1 code implementation24 Jun 2019 Shunan Mao, Shiliang Zhang, Ming Yang

RIFE adopts two feature extraction streams weighted by a dual-attention block to learn features for low and high resolution images, respectively.

Person Re-Identification Super-Resolution

Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition

no code implementations27 Mar 2019 Shiliang Zhang, Ming Lei, Zhijie Yan

Results in a 20, 000 hours Mandarin speech recognition task show that the proposed spelling correction model can achieve a CER of 3. 41%, which results in 22. 9% and 53. 2% relative improvement compared to the baseline CTC-based systems decoded with and without language model respectively.

Language Modelling Machine Translation +4

Bi-Directional Cascade Network for Perceptual Edge Detection

2 code implementations CVPR 2019 Jianzhong He, Shiliang Zhang, Ming Yang, Yanhu Shan, Tiejun Huang

Exploiting multi-scale representations is critical to improve edge detection for objects at different scales.

Edge Detection

Multi-scale 3D Convolution Network for Video Based Person Re-Identification

no code implementations19 Nov 2018 Jianing Li, Shiliang Zhang, Tiejun Huang

A temporal stream in this network is constructed by inserting several Multi-scale 3D (M3D) convolution layers into a 2D CNN network.

Video-Based Person Re-Identification

RAM: A Region-Aware Deep Model for Vehicle Re-Identification

no code implementations25 Jun 2018 Xiaobin Liu, Shiliang Zhang, Qingming Huang, Wen Gao

Specifically, in addition to extracting global features, RAM also extracts features from a series of local regions.

Vehicle Re-Identification

Deep-FSMN for Large Vocabulary Continuous Speech Recognition

1 code implementation4 Mar 2018 Shiliang Zhang, Ming Lei, Zhijie Yan, Li-Rong Dai

In a 20000 hours Mandarin recognition task, the LFR trained DFSMN can achieve more than 20% relative improvement compared to the LFR trained BLSTM.

Language Modelling speech-recognition +1

Deep Feed-forward Sequential Memory Networks for Speech Synthesis

no code implementations26 Feb 2018 Mengxiao Bi, Heng Lu, Shiliang Zhang, Ming Lei, Zhijie Yan

The Bidirectional LSTM (BLSTM) RNN based speech synthesis system is among the best parametric Text-to-Speech (TTS) systems in terms of the naturalness of generated speech, especially the naturalness in prosody.

speech-recognition Speech Recognition +1

LVreID: Person Re-Identification with Long Sequence Videos

no code implementations20 Dec 2017 Jianing Li, Shiliang Zhang, Jingdong Wang, Wen Gao, Qi Tian

This paper mainly establishes a large-scale Long sequence Video database for person re-IDentification (LVreID).

Person Re-Identification

Person Transfer GAN to Bridge Domain Gap for Person Re-Identification

26 code implementations CVPR 2018 Longhui Wei, Shiliang Zhang, Wen Gao, Qi Tian

Although the performance of person Re-Identification (ReID) has been significantly boosted, many challenging issues in real scenarios have not been fully investigated, e. g., the complex scenes and lighting variations, viewpoint and pose changes, and the large number of identities in a camera network.

Generative Adversarial Network Person Re-Identification +1

Pose-driven Deep Convolutional Model for Person Re-identification

no code implementations ICCV 2017 Chi Su, Jianing Li, Shiliang Zhang, Junliang Xing, Wen Gao, Qi Tian

Our deep architecture explicitly leverages the human part cues to alleviate the pose variations and learn robust feature representations from both the global image and different local parts.

Person Re-Identification

E$^2$BoWs: An End-to-End Bag-of-Words Model via Deep Convolutional Neural Network

no code implementations18 Sep 2017 Xiaobin Liu, Shiliang Zhang, Tiejun Huang, Qi Tian

To conquer these issues, we propose an End-to-End BoWs (E$^2$BoWs) model based on Deep Convolutional Neural Network (DCNN).

Image Retrieval Quantization +1

GLAD: Global-Local-Alignment Descriptor for Pedestrian Retrieval

no code implementations13 Sep 2017 Longhui Wei, Shiliang Zhang, Hantao Yao, Wen Gao, Qi Tian

Targeting to solve these problems, this work proposes a Global-Local-Alignment Descriptor (GLAD) and an efficient indexing and retrieval framework, respectively.

Person Re-Identification Representation Learning +1

One-Shot Fine-Grained Instance Retrieval

no code implementations4 Jul 2017 Hantao Yao, Shiliang Zhang, Yongdong Zhang, Jintao Li, Qi Tian

Aiming to conquer this issue, we propose a retrieval task named One-Shot Fine-Grained Instance Retrieval (OSFGIR).

Fine-Grained Visual Categorization Image Retrieval +1

Deep Representation Learning with Part Loss for Person Re-Identification

no code implementations4 Jul 2017 Hantao Yao, Shiliang Zhang, Yongdong Zhang, Jintao Li, Qi Tian

The representation learning risk is evaluated by the proposed part loss, which automatically generates several parts for an image, and computes the person classification loss on each part separately.

Classification General Classification +2

DR2-Net: Deep Residual Reconstruction Network for Image Compressive Sensing

1 code implementation19 Feb 2017 Hantao Yao, Feng Dai, Dongming Zhang, Yike Ma, Shiliang Zhang, Yongdong Zhang, Qi Tian

Accordingly, DR$^{2}$-Net consists of two components, \emph{i. e.,} linear mapping network and residual network, respectively.

Compressive Sensing Image Reconstruction

Neural Networks Models for Entity Discovery and Linking

no code implementations11 Nov 2016 Dan Liu, Wei. Lin, Shiliang Zhang, Si Wei, Hui Jiang

This paper describes the USTC_NELSLIP systems submitted to the Trilingual Entity Detection and Linking (EDL) track in 2016 TAC Knowledge Base Population (KBP) contests.

Clustering Entity Linking +1

Deep Attributes Driven Multi-Camera Person Re-identification

no code implementations11 May 2016 Chi Su, Shiliang Zhang, Junliang Xing, Wen Gao, Qi Tian

And we propose a semi-supervised attribute learning framework which progressively boosts the accuracy of attributes only using a limited number of labeled data.

Attribute Metric Learning +1

Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency

no code implementations28 Dec 2015 Shiliang Zhang, Cong Liu, Hui Jiang, Si Wei, Li-Rong Dai, Yu Hu

In this paper, we propose a novel neural network structure, namely \emph{feedforward sequential memory networks (FSMN)}, to model long-term dependency in time series without using recurrent feedback.

Language Modelling speech-recognition +3

Multi-Task Learning With Low Rank Attribute Embedding for Person Re-Identification

no code implementations ICCV 2015 Chi Su, Fan Yang, Shiliang Zhang, Qi Tian, Larry S. Davis, Wen Gao

Since attributes are generally correlated, we introduce a low rank attribute embedding into the MTL formulation to embed original binary attributes to a continuous attribute space, where incorrect and incomplete attributes are rectified and recovered to better describe people.

Attribute Multi-Task Learning +1

Feedforward Sequential Memory Neural Networks without Recurrent Feedback

no code implementations9 Oct 2015 ShiLiang Zhang, Hui Jiang, Si Wei, Li-Rong Dai

We introduce a new structure for memory neural networks, called feedforward sequential memory networks (FSMN), which can learn long-term dependency without using recurrent feedback.

Language Modelling

A Fixed-Size Encoding Method for Variable-Length Sequences with its Application to Neural Network Language Models

1 code implementation6 May 2015 Shiliang Zhang, Hui Jiang, MingBin Xu, JunFeng Hou, Li-Rong Dai

In this paper, we propose the new fixed-size ordinally-forgetting encoding (FOFE) method, which can almost uniquely encode any variable-length sequence of words into a fixed-size representation.

Hybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Probe and Learn Neural Networks

no code implementations3 Feb 2015 Shiliang Zhang, Hui Jiang

As a result, the HOPE framework can be used as a novel tool to probe why and how NNs work, more importantly, to learn NNs in either supervised or unsupervised ways.

Image Classification speech-recognition +1

