Search Results for author: Ming Sun

Found 59 papers, 13 papers with code

Compact Generalized Non-local Network

2 code implementations NeurIPS 2018 Kaiyu Yue, Ming Sun, Yuchen Yuan, Feng Zhou, Errui Ding, Fuxin Xu

The non-local module is designed for capturing long-range spatio-temporal dependencies in images and videos.

Object Detection Object Recognition +1

Inception Convolution with Efficient Dilation Search

1 code implementation CVPR 2021 Jie Liu, Chuming Li, Feng Liang, Chen Lin, Ming Sun, Junjie Yan, Wanli Ouyang, Dong Xu

To develop a practical method for learning complex inception convolution based on the data, a simple but effective search algorithm, referred to as efficient dilation optimization (EDO), is developed.

Human Detection Instance Segmentation +4

Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition

1 code implementation ECCV 2018 Ming Sun, Yuchen Yuan, Feng Zhou, Errui Ding

Attention-based learning for fine-grained image recognition remains a challenging task, where most of the existing methods treat each object part in isolation, while neglecting the correlations among them.

Fine-Grained Image Recognition Metric Learning

Improving Auto-Augment via Augmentation-Wise Weight Sharing

1 code implementation NeurIPS 2020 Keyu Tian, Chen Lin, Ming Sun, Luping Zhou, Junjie Yan, Wanli Ouyang

On CIFAR-10, we achieve a top-1 error rate of 1. 24%, which is currently the best performing single model without extra training data.

DETR for Crowd Pedestrian Detection

1 code implementation12 Dec 2020 Matthieu Lin, Chuming Li, Xingyuan Bu, Ming Sun, Chen Lin, Junjie Yan, Wanli Ouyang, Zhidong Deng

Furthermore, the bipartite match of ED harms the training efficiency due to the large ground truth number in crowd scenes.

Pedestrian Detection

BN-NAS: Neural Architecture Search with Batch Normalization

1 code implementation ICCV 2021 BoYu Chen, Peixia Li, Baopu Li, Chen Lin, Chuming Li, Ming Sun, Junjie Yan, Wanli Ouyang

We present BN-NAS, neural architecture search with Batch Normalization (BN-NAS), to accelerate neural architecture search (NAS).

Neural Architecture Search

Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment

1 code implementation13 Apr 2023 Kai Zhao, Kun Yuan, Ming Sun, Xing Wen

Video quality assessment (VQA) aims to simulate the human perception of video quality, which is influenced by factors ranging from low-level color and texture details to high-level semantic content.

Video Quality Assessment Visual Question Answering (VQA)

Reconstructed Convolution Module Based Look-Up Tables for Efficient Image Super-Resolution

1 code implementation ICCV 2023 Guandu Liu, Yukang Ding, Mading Li, Ming Sun, Xing Wen, Bin Wang

To enlarge RF with contained LUT sizes, we propose a novel Reconstructed Convolution(RC) module, which decouples channel-wise and spatial calculation.

Image Super-Resolution

Acoustic scene analysis with multi-head attention networks

1 code implementation16 Sep 2019 Weimin Wang, Weiran Wang, Ming Sun, Chao Wang

Acoustic Scene Classification (ASC) is a challenging task, as a single scene may involve multiple events that contain complex sound patterns.

Acoustic Scene Classification General Classification +1

Max-Pooling Loss Training of Long Short-Term Memory Networks for Small-Footprint Keyword Spotting

no code implementations5 May 2017 Ming Sun, Anirudh Raju, George Tucker, Sankaran Panchapagesan, Geng-Shen Fu, Arindam Mandal, Spyros Matsoukas, Nikko Strom, Shiv Vitaladevuni

Finally, the max-pooling loss trained LSTM initialized with a cross-entropy pre-trained network shows the best performance, which yields $67. 6\%$ relative reduction compared to baseline feed-forward DNN in Area Under the Curve (AUC) measure.

Small-Footprint Keyword Spotting

An Empirical Evaluation of Zero Resource Acoustic Unit Discovery

no code implementations5 Feb 2017 Chunxi Liu, Jinyi Yang, Ming Sun, Santosh Kesiraju, Alena Rott, Lucas Ondel, Pegah Ghahremani, Najim Dehak, Lukas Burget, Sanjeev Khudanpur

Acoustic unit discovery (AUD) is a process of automatically identifying a categorical acoustic unit inventory from speech and producing corresponding acoustic unit tokenizations.

Acoustic Unit Discovery

Generalized Canonical Correlation Analysis for Classification

no code implementations30 Apr 2013 Cencheng Shen, Ming Sun, Minh Tang, Carey E. Priebe

For multiple multivariate data sets, we derive conditions under which Generalized Canonical Correlation Analysis (GCCA) improves classification performance of the projected datasets, compared to standard Canonical Correlation Analysis (CCA) using only two data sets.

Classification General Classification

Compression of Acoustic Event Detection Models with Low-rank Matrix Factorization and Quantization Training

no code implementations NIPS Workshop CDNNRIA 2018 Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang

In this paper, we present a compression approach based on the combination of low-rank matrix factorization and quantization training, to reduce complexity for neural network based acoustic event detection (AED) models.

Event Detection Quantization

AppDialogue: Multi-App Dialogues for Intelligent Assistants

no code implementations LREC 2016 Ming Sun, Yun-Nung Chen, Zhenhao Hua, Yulian Tamres-Rudnicky, Arnab Dash, Alex Rudnicky, er

Users will interact with an individual app on smart devices (e. g., phone, TV, car) to fulfill a specific goal (e. g. find a photographer), but users may also pursue more complex tasks that will span multiple domains and apps (e. g. plan a wedding ceremony).

Compression of Acoustic Event Detection Models With Quantized Distillation

no code implementations1 Jul 2019 Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang

Acoustic Event Detection (AED), aiming at detecting categories of events based on audio signals, has found application in many intelligent systems.

Event Detection Knowledge Distillation +1

Efficient Neural Architecture Transformation Searchin Channel-Level for Object Detection

no code implementations5 Sep 2019 Junran Peng, Ming Sun, Zhao-Xiang Zhang, Tieniu Tan, Junjie Yan

With the combination of these two designs, an architecture transformation scheme could be discovered to adapt a network designed for image classification to task of object detection.

Image Classification Neural Architecture Search +3

POD: Practical Object Detection with Scale-Sensitive Network

no code implementations ICCV 2019 Junran Peng, Ming Sun, Zhao-Xiang Zhang, Tieniu Tan, Junjie Yan

Scale-sensitive object detection remains a challenging task, where most of the existing methods could not learn it explicitly and are not robust to scale variance.

Object object-detection +1

Improving One-shot NAS by Suppressing the Posterior Fading

no code implementations CVPR 2020 Xiang Li, Chen Lin, Chuming Li, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang

In this paper, we analyse existing weight sharing one-shot NAS approaches from a Bayesian point of view and identify the posterior fading problem, which compromises the effectiveness of shared weights.

Neural Architecture Search object-detection +2

Computation Reallocation for Object Detection

no code implementations ICLR 2020 Feng Liang, Chen Lin, Ronghao Guo, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang

However, classification allocation pattern is usually adopted directly to object detector, which is proved to be sub-optimal.

Instance Segmentation Neural Architecture Search +4

Powering One-shot Topological NAS with Stabilized Share-parameter Proxy

no code implementations ECCV 2020 Ronghao Guo, Chen Lin, Chuming Li, Keyu Tian, Ming Sun, Lu Sheng, Junjie Yan

Specifically, the difficulties for architecture searching in such a complex space has been eliminated by the proposed stabilized share-parameter proxy, which employs Stochastic Gradient Langevin Dynamics to enable fast shared parameter sampling, so as to achieve stabilized measurement of architecture performance even in search space with complex topological structures.

Neural Architecture Search

On Front-end Gain Invariant Modeling for Wake Word Spotting

no code implementations13 Oct 2020 Yixin Gao, Noah D. Stein, Chieh-Chi Kao, Yunliang Cai, Ming Sun, Tao Zhang, Shiv Vitaladevuni

Since the WW model is trained with the AFE-processed audio data, its performance is sensitive to AFE variations, such as gain changes.

Adaptive Gradient Method with Resilience and Momentum

no code implementations21 Oct 2020 Jie Liu, Chen Lin, Chuming Li, Lu Sheng, Ming Sun, Junjie Yan, Wanli Ouyang

Several variants of stochastic gradient descent (SGD) have been proposed to improve the learning effectiveness and efficiency when training deep neural networks, among which some recent influential attempts would like to adaptively control the parameter-wise learning rate (e. g., Adam and RMSProp).

PSViT: Better Vision Transformer via Token Pooling and Attention Sharing

no code implementations7 Aug 2021 BoYu Chen, Peixia Li, Baopu Li, Chuming Li, Lei Bai, Chen Lin, Ming Sun, Junjie Yan, Wanli Ouyang

Then, a compact set of the possible combinations for different token pooling and attention sharing mechanisms are constructed.

Federated Self-Supervised Learning for Acoustic Event Classification

no code implementations22 Mar 2022 Meng Feng, Chieh-Chi Kao, Qingming Tang, Ming Sun, Viktor Rozgic, Spyros Matsoukas, Chao Wang

Standard acoustic event classification (AEC) solutions require large-scale collection of data from client devices for model optimization.

Classification Continual Learning +3

NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results

no code implementations25 May 2022 Eduardo Pérez-Pellitero, Sibi Catley-Chandar, Richard Shaw, Aleš Leonardis, Radu Timofte, Zexin Zhang, Cen Liu, Yunbo Peng, Yue Lin, Gaocheng Yu, Jin Zhang, Zhe Ma, Hongbin Wang, Xiangyu Chen, Xintao Wang, Haiwei Wu, Lin Liu, Chao Dong, Jiantao Zhou, Qingsen Yan, Song Zhang, Weiye Chen, Yuhang Liu, Zhen Zhang, Yanning Zhang, Javen Qinfeng Shi, Dong Gong, Dan Zhu, Mengdi Sun, Guannan Chen, Yang Hu, Haowei Li, Baozhu Zou, Zhen Liu, Wenjie Lin, Ting Jiang, Chengzhi Jiang, Xinpeng Li, Mingyan Han, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Juan Marín-Vega, Michael Sloth, Peter Schneider-Kamp, Richard Röttger, Chunyang Li, Long Bao, Gang He, Ziyao Xu, Li Xu, Gen Zhan, Ming Sun, Xing Wen, Junlin Li, Shuang Feng, Fei Lei, Rui Liu, Junxiang Ruan, Tianhong Dai, Wei Li, Zhan Lu, Hengyan Liu, Peian Huang, Guangyu Ren, Yonglin Luo, Chang Liu, Qiang Tu, Fangya Li, Ruipeng Gang, Chenghua Li, Jinjing Li, Sai Ma, Chenming Liu, Yizhen Cao, Steven Tel, Barthelemy Heyrman, Dominique Ginhac, Chul Lee, Gahyeon Kim, Seonghyun Park, An Gia Vien, Truong Thanh Nhat Mai, Howoon Yoon, Tu Vo, Alexander Holston, Sheir Zaheer, Chan Y. Park

The challenge is composed of two tracks with an emphasis on fidelity and complexity constraints: In Track 1, participants are asked to optimize objective fidelity scores while imposing a low-complexity constraint (i. e. solutions can not exceed a given number of operations).

Image Restoration Vocal Bursts Intensity Prediction

Impact of Acoustic Event Tagging on Scene Classification in a Multi-Task Learning Framework

no code implementations27 Jun 2022 Rahil Parikh, Harshavardhan Sundar, Ming Sun, Chao Wang, Spyros Matsoukas

We conclude that this improvement in ASC performance comes from the regularization effect of using AET and not from the network's improved ability to discern between acoustic events.

Acoustic Scene Classification Multi-Task Learning +1

Global Priors Guided Modulation Network for Joint Super-Resolution and Inverse Tone-Mapping

no code implementations14 Aug 2022 Gang He, Shaoyi Long, Li Xu, Chang Wu, Jinjia Zhou, Ming Sun, Xing Wen, Yurong Dai

Joint super-resolution and inverse tone-mapping (SR-ITM) aims to enhance the visual quality of videos that have quality deficiencies in resolution and dynamic range.

4k inverse tone mapping +3

SDRTV-to-HDRTV Conversion via Spatial-Temporal Feature Fusion

no code implementations4 Nov 2022 Kepeng Xu, Li Xu, Gang He, Chang Wu, Zijia Ma, Ming Sun, Yu-Wing Tai

To evaluate the performance of the proposed method, we construct a corresponding multi-frame dataset using HDR video of the HDR10 standard to conduct a comprehensive evaluation of different methods.

Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-Based, Alignment-Free and Hybrid Approaches

no code implementations17 Feb 2023 Vinicius Ribeiro, Yiteng Huang, Yuan Shangguan, Zhaojun Yang, Li Wan, Ming Sun

The third, proposed by us, is a hybrid solution in which the model is trained with a small set of aligned data and then tuned with a sizeable unaligned dataset.

Quality-aware Pre-trained Models for Blind Image Quality Assessment

no code implementations CVPR 2023 Kai Zhao, Kun Yuan, Ming Sun, Mading Li, Xing Wen

Blind image quality assessment (BIQA) aims to automatically evaluate the perceived quality of a single image, whose performance has been improved by deep learning-based methods in recent years.

Blind Image Quality Assessment Self-Supervised Learning

Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment

no code implementations31 Jul 2023 Kun Yuan, Zishang Kong, Chuanchuan Zheng, Ming Sun, Xing Wen

\textit{Second}, the perceptual quality of a video exhibits a multi-distortion distribution, due to the differences in the duration and probability of occurrence for various distortions.

Action Recognition Blocking +2

Blind Image Super-resolution with Rich Texture-Aware Codebooks

no code implementations26 Oct 2023 Rui Qin, Ming Sun, Fangyuan Zhang, Xing Wen, Bin Wang

However, we find that a codebook based on HR reconstruction may not effectively capture the complex correlations between low-resolution (LR) and HR images.

Blind Super-Resolution Image Super-Resolution

FADI-AEC: Fast Score Based Diffusion Model Guided by Far-end Signal for Acoustic Echo Cancellation

no code implementations8 Jan 2024 Yang Liu, Li Wan, Yun Li, Yiteng Huang, Ming Sun, James Luan, Yangyang Shi, Xin Lei

Despite the potential of diffusion models in speech enhancement, their deployment in Acoustic Echo Cancellation (AEC) has been restricted.

Acoustic echo cancellation Speech Enhancement

AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition

no code implementations18 Jan 2024 Ju Lin, Niko Moritz, Yiteng Huang, Ruiming Xie, Ming Sun, Christian Fuegen, Frank Seide

Wearable devices like smart glasses are approaching the compute capability to seamlessly generate real-time closed captions for live conversations.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

KVQ: Kwai Video Quality Assessment for Short-form Videos

no code implementations11 Feb 2024 Yiting Lu, Xin Li, Yajing Pei, Kun Yuan, Qizhi Xie, Yunpeng Qu, Ming Sun, Chao Zhou, Zhibo Chen

Short-form UGC video platforms, like Kwai and TikTok, have been an emerging and irreplaceable mainstream media form, thriving on user-friendly engagement, and kaleidoscope creation, etc.

Video Quality Assessment Visual Question Answering (VQA)

XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution

no code implementations8 Mar 2024 Yunpeng Qu, Kun Yuan, Kai Zhao, Qizhi Xie, Jinhua Hao, Ming Sun, Chao Zhou

Diffusion-based methods, endowed with a formidable generative prior, have received increasing attention in Image Super-Resolution (ISR) recently.

Image Super-Resolution

CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement

no code implementations15 Mar 2024 Qiang Zhu, Jinhua Hao, Yukang Ding, Yu Liu, Qiao Mo, Ming Sun, Chao Zhou, Shuyuan Zhu

Specifically, the ITA module aggregates temporal information from consecutive frames and coding priors, while the MNA module globally captures spatial information guided by residual frames.

CasSR: Activating Image Power for Real-World Image Super-Resolution

no code implementations18 Mar 2024 Haolan Chen, Jinhua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou, Wei Hu

In particular, we develop a cascaded controllable diffusion model that aims to optimize the extraction of information from low-resolution images.

Image Restoration Image Super-Resolution

Cannot find the paper you are looking for? You can Submit a new open access paper.