no code implementations • 13 Mar 2025 • Yunpeng Qu, Kun Yuan, Qizhi Xie, Ming Sun, Chao Zhou, Jian Wang
Inspired by the Human Visual System (HVS) that links global quality to the local texture of different regions and their visual saliency, we propose a Kaleidoscope Video Quality Assessment (KVQ) framework, which aims to effectively assess both saliency and local texture, thereby facilitating the assessment of global quality.
1 code implementation • 31 Jan 2025 • Yunpeng Qu, Kun Yuan, Jinhua Hao, Kai Zhao, Qizhi Xie, Ming Sun, Chao Zhou
Image Super-Resolution (ISR) has seen significant progress with the introduction of remarkable generative models.
1 code implementation • 31 Jan 2025 • Tongda Xu, Xiyan Cai, Xinjie Zhang, Xingtong Ge, Dailan He, Ming Sun, Jingjing Liu, Ya-Qin Zhang, Jian Li, Yan Wang
Recent advancements in diffusion models have been leveraged to address inverse problems without additional training, and Diffusion Posterior Sampling (DPS) (Chung et al., 2022a) is among the most popular approaches.
1 code implementation • 18 Dec 2024 • Jingwei Bao, Jinhua Hao, Pengcheng Xu, Ming Sun, Chao Zhou, Shuyuan Zhu
High-resolution (HR) images are commonly downscaled to low-resolution (LR) to reduce bandwidth, followed by upscaling to restore their original details.
Ranked #1 on
Image Rescaling
on DIV2K val-q30-2x
no code implementations • 14 Dec 2024 • Yang Liu, Li Wan, Yiteng Huang, Ming Sun, Yangyang Shi, Florian Metze
Deep learning models like Convolutional Neural Networks and transformers have shown impressive capabilities in speech verification, gaining considerable attention in the research community.
no code implementations • 13 Sep 2024 • Anfeng Xu, Biqiao Zhang, Shuyu Kong, Yiteng Huang, Zhaojun Yang, Sangeeta Srivastava, Ming Sun
Keyword spotting (KWS) is an important speech processing component for smart devices with voice assistance capability.
no code implementations • 27 Aug 2024 • Zhenyu Wang, Shuyu Kong, Li Wan, Biqiao Zhang, Yiteng Huang, Mumin Jin, Ming Sun, Xin Lei, Zhaojun Yang
Existing keyword spotting (KWS) systems primarily rely on predefined keyword phrases.
no code implementations • 23 Aug 2024 • Zhenyu Wang, Li Wan, Biqiao Zhang, Yiteng Huang, Shang-Wen Li, Ming Sun, Xin Lei, Zhaojun Yang
A keyword spotting (KWS) engine that is continuously running on device is exposed to various speech signals that are usually unseen before.
1 code implementation • 22 Aug 2024 • Ming Sun, Lihua Jing, Zixuan Zhu, Rui Wang
Furthermore, due to the perceptibility, diversity, and similarity of facial datasets, many state-of-the-art backdoor attacks lose effectiveness on face recognition tasks.
1 code implementation • 21 Aug 2024 • Qiao Mo, Yukang Ding, Jinhua Hao, Qiang Zhu, Ming Sun, Chao Zhou, Feiyu Chen, Shuyuan Zhu
Specifically, the predictor estimates pixel offsets between the first and second compression, which are then utilized to divide different patterns.
no code implementations • 10 Aug 2024 • Zhengang Lu, Hongsheng Qin, Jing Li, Ming Sun, Jiubin Tan
To address the limitations of existing smartphone microscopes, which feature short working distances and inadequate transmission imaging for industrial in-situ inspection, we propose a novel long-working distance reflective smartphone microscopy system (LD-RSM).
1 code implementation • 23 Jul 2024 • Qizhi Xie, Kun Yuan, Yunpeng Qu, Mingda Wu, Ming Sun, Chao Zhou, Jihong Zhu
To this end, we propose Quality- and aesthetics-aware pretraining (QPT V2), the first pretraining framework based on MIM that offers a unified solution to quality and aesthetics assessment.
1 code implementation • 20 Jul 2024 • Rui Qin, Ming Sun, Chao Zhou, Bin Wang
However, we find that the efficacy of recent methods obviously diminishes when employed on image data with blur, while image data with intentional blur constitute a substantial proportion of general data.
no code implementations • CVPR 2024 • Kun Yuan, Hongbo Liu, Mading Li, Muyi Sun, Ming Sun, Jiachao Gong, Jinhua Hao, Chao Zhou, Yansong Tang
In this paper, we propose a VQA method named PTM-VQA, which leverages PreTrained Models to transfer knowledge from models pretrained on various pre-tasks, enabling benefits for VQA from different aspects.
no code implementations • 17 May 2024 • Jinchuan Zhang, Bei Hui, Chong Mu, Ming Sun, Ling Tian
Concretely, $\mathsf{HisRES}$ comprises two distinctive modules excelling in structuring historically relevant events within TKGs, including a multi-granularity evolutionary encoder that captures structural and temporal dependencies of the most recent snapshots, and a global relevance encoder that concentrates on crucial correlations among events relevant to queries from the entire history.
1 code implementation • 17 Apr 2024 • Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, HaoNing Wu, ZiCheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei LI, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, Fangyuan Kong, Haotian Fan, Yifang Xu, Haoran Xu, Mengduo Yang, Jie zhou, Jiaze Li, Shijie Wen, Mai Xu, Da Li, Shunyu Yao, Jiazhi Du, WangMeng Zuo, Zhibo Li, Shuai He, Anlong Ming, Huiyuan Fu, Huadong Ma, Yong Wu, Fie Xue, Guozhi Zhao, Lina Du, Jie Guo, Yu Zhang, huimin zheng, JunHao Chen, Yue Liu, Dulan Zhou, Kele Xu, Qisheng Xu, Tao Sun, Zhixiang Ding, Yuhang Hu
This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i. e., Kuaishou/Kwai Platform.
1 code implementation • 15 Apr 2024 • Zheng Chen, Zongwei Wu, Eduard Zamfir, Kai Zhang, Yulun Zhang, Radu Timofte, Xiaokang Yang, Hongyuan Yu, Cheng Wan, Yuxin Hong, Zhijuan Huang, Yajun Zou, Yuan Huang, Jiamin Lin, Bingnan Han, Xianyu Guan, Yongsheng Yu, Daoan Zhang, Xuanwu Yin, Kunlong Zuo, Jinhua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou, Hongyu An, Xinfeng Zhang, Zhiyuan Song, Ziyue Dong, Qing Zhao, Xiaogang Xu, Pengxu Wei, Zhi-chao Dou, Gui-ling Wang, Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou, Cansu Korkmaz, A. Murat Tekalp, Yubin Wei, Xiaole Yan, Binren Li, Haonan Chen, Siqi Zhang, Sihan Chen, Amogh Joshi, Nikhil Akalwadi, Sampada Malagi, Palani Yashaswini, Chaitra Desai, Ramesh Ashok Tabib, Ujwala Patil, Uma Mudenagudi, Anjali Sarvaiya, Pooja Choksy, Jagrit Joshi, Shubh Kawa, Kishor Upla, Sushrut Patwardhan, Raghavendra Ramachandra, Sadat Hossain, Geongi Park, S. M. Nadim Uddin, Hao Xu, Yanhui Guo, Aman Urumbekov, Xingzhuo Yan, Wei Hao, Minghan Fu, Isaac Orais, Samuel Smith, Ying Liu, Wangwang Jia, Qisheng Xu, Kele Xu, Weijun Yuan, Zhan Li, Wenqin Kuang, Ruijin Guan, Ruting Deng, Zhao Zhang, Bo wang, Suiyi Zhao, Yan Luo, Yanyan Wei, Asif Hussain Khan, Christian Micheloni, Niki Martinel
This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained.
no code implementations • 18 Mar 2024 • Haolan Chen, Jinhua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou, Wei Hu
In particular, we develop a cascaded controllable diffusion model that aims to optimize the extraction of information from low-resolution images.
1 code implementation • CVPR 2024 • Qiang Zhu, Jinhua Hao, Yukang Ding, Yu Liu, Qiao Mo, Ming Sun, Chao Zhou, Shuyuan Zhu
Specifically, the ITA module aggregates temporal information from consecutive frames and coding priors, while the MNA module globally captures spatial information guided by residual frames.
1 code implementation • 8 Mar 2024 • Yunpeng Qu, Kun Yuan, Kai Zhao, Qizhi Xie, Jinhua Hao, Ming Sun, Chao Zhou
Diffusion-based methods, endowed with a formidable generative prior, have received increasing attention in Image Super-Resolution (ISR) recently.
1 code implementation • CVPR 2024 • Yiting Lu, Xin Li, Yajing Pei, Kun Yuan, Qizhi Xie, Yunpeng Qu, Ming Sun, Chao Zhou, Zhibo Chen
Short-form UGC video platforms, like Kwai and TikTok, have been an emerging and irreplaceable mainstream media form, thriving on user-friendly engagement, and kaleidoscope creation, etc.
no code implementations • 9 Feb 2024 • Tongda Xu, Ziran Zhu, Jian Li, Dailan He, Yuanyuan Wang, Ming Sun, Ling Li, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang
Diffusion Inverse Solvers (DIS) are designed to sample from the conditional distribution $p_{\theta}(X_0|y)$, with a predefined diffusion model $p_{\theta}(X_0)$, an operator $f(\cdot)$, and a measurement $y=f(x'_0)$ derived from an unknown image $x'_0$.
no code implementations • 18 Jan 2024 • Ju Lin, Niko Moritz, Yiteng Huang, Ruiming Xie, Ming Sun, Christian Fuegen, Frank Seide
Wearable devices like smart glasses are approaching the compute capability to seamlessly generate real-time closed captions for live conversations.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 8 Jan 2024 • Yang Liu, Li Wan, Yun Li, Yiteng Huang, Ming Sun, James Luan, Yangyang Shi, Xin Lei
Despite the potential of diffusion models in speech enhancement, their deployment in Acoustic Echo Cancellation (AEC) has been restricted.
no code implementations • 26 Oct 2023 • Rui Qin, Ming Sun, Fangyuan Zhang, Xing Wen, Bin Wang
However, we find that a codebook based on HR reconstruction may not effectively capture the complex correlations between low-resolution (LR) and HR images.
no code implementations • 1 Aug 2023 • Hongbo Liu, Mingda Wu, Kun Yuan, Ming Sun, Yansong Tang, Chuanchuan Zheng, Xing Wen, Xiu Li
Video quality assessment (VQA) has attracted growing attention in recent years.
no code implementations • 31 Jul 2023 • Kun Yuan, Zishang Kong, Chuanchuan Zheng, Ming Sun, Xing Wen
\textit{Second}, the perceptual quality of a video exhibits a multi-distortion distribution, due to the differences in the duration and probability of occurrence for various distortions.
no code implementations • 19 Jul 2023 • Xiaohong Liu, Xiongkuo Min, Wei Sun, Yulun Zhang, Kai Zhang, Radu Timofte, Guangtao Zhai, Yixuan Gao, Yuqin Cao, Tengchuan Kou, Yunlong Dong, Ziheng Jia, Yilin Li, Wei Wu, Shuming Hu, Sibin Deng, Pengxiang Xiao, Ying Chen, Kai Li, Kai Zhao, Kun Yuan, Ming Sun, Heng Cong, Hao Wang, Lingzhi Fu, Yusheng Zhang, Rongyu Zhang, Hang Shi, Qihang Xu, Longan Xiao, Zhiliang Ma, Mirko Agarla, Luigi Celona, Claudio Rota, Raimondo Schettini, Zhiwei Huang, Yanan Li, Xiaotao Wang, Lei Lei, Hongye Liu, Wei Hong, Ironhead Chuang, Allen Lin, Drake Guan, Iris Chen, Kae Lou, Willy Huang, Yachun Tasi, Yvonne Kao, Haotian Fan, Fangyuan Kong, Shiqi Zhou, Hao liu, Yu Lai, Shanshan Chen, Wenqi Wang, HaoNing Wu, Chaofeng Chen, Chunzheng Zhu, Zekun Guo, Shiling Zhao, Haibing Yin, Hongkui Wang, Hanene Brachemi Meftah, Sid Ahmed Fezza, Wassim Hamidouche, Olivier Déforges, Tengfei Shi, Azadeh Mansouri, Hossein Motamednia, Amir Hossein Bakhtiari, Ahmad Mahmoudi Aznaveh
61 participating teams submitted their prediction results during the development phase, with a total of 3168 submissions.
1 code implementation • ICCV 2023 • Guandu Liu, Yukang Ding, Mading Li, Ming Sun, Xing Wen, Bin Wang
To enlarge RF with contained LUT sizes, we propose a novel Reconstructed Convolution(RC) module, which decouples channel-wise and spatial calculation.
1 code implementation • 13 Apr 2023 • Kai Zhao, Kun Yuan, Ming Sun, Xing Wen
Video quality assessment (VQA) aims to simulate the human perception of video quality, which is influenced by factors ranging from low-level color and texture details to high-level semantic content.
no code implementations • CVPR 2023 • Kai Zhao, Kun Yuan, Ming Sun, Mading Li, Xing Wen
Blind image quality assessment (BIQA) aims to automatically evaluate the perceived quality of a single image, whose performance has been improved by deep learning-based methods in recent years.
no code implementations • 17 Feb 2023 • Vinicius Ribeiro, Yiteng Huang, Yuan Shangguan, Zhaojun Yang, Li Wan, Ming Sun
The third, proposed by us, is a hybrid solution in which the model is trained with a small set of aligned data and then tuned with a sizeable unaligned dataset.
no code implementations • 9 Nov 2022 • Haichuan Yang, Zhaojun Yang, Li Wan, Biqiao Zhang, Yangyang Shi, Yiteng Huang, Ivaylo Enchev, Limin Tang, Raziel Alvarez, Ming Sun, Xin Lei, Raghuraman Krishnamoorthi, Vikas Chandra
This paper proposes a hardware-efficient architecture, Linearized Convolution Network (LiCo-Net) for keyword spotting.
no code implementations • 4 Nov 2022 • Kepeng Xu, Li Xu, Gang He, Chang Wu, Zijia Ma, Ming Sun, Yu-Wing Tai
To evaluate the performance of the proposed method, we construct a corresponding multi-frame dataset using HDR video of the HDR10 standard to conduct a comprehensive evaluation of different methods.
no code implementations • 14 Aug 2022 • Gang He, Shaoyi Long, Li Xu, Chang Wu, Jinjia Zhou, Ming Sun, Xing Wen, Yurong Dai
Joint super-resolution and inverse tone-mapping (SR-ITM) aims to enhance the visual quality of videos that have quality deficiencies in resolution and dynamic range.
no code implementations • 27 Jun 2022 • Rahil Parikh, Harshavardhan Sundar, Ming Sun, Chao Wang, Spyros Matsoukas
We conclude that this improvement in ASC performance comes from the regularization effect of using AET and not from the network's improved ability to discern between acoustic events.
no code implementations • 25 May 2022 • Eduardo Pérez-Pellitero, Sibi Catley-Chandar, Richard Shaw, Aleš Leonardis, Radu Timofte, Zexin Zhang, Cen Liu, Yunbo Peng, Yue Lin, Gaocheng Yu, Jin Zhang, Zhe Ma, Hongbin Wang, Xiangyu Chen, Xintao Wang, Haiwei Wu, Lin Liu, Chao Dong, Jiantao Zhou, Qingsen Yan, Song Zhang, Weiye Chen, Yuhang Liu, Zhen Zhang, Yanning Zhang, Javen Qinfeng Shi, Dong Gong, Dan Zhu, Mengdi Sun, Guannan Chen, Yang Hu, Haowei Li, Baozhu Zou, Zhen Liu, Wenjie Lin, Ting Jiang, Chengzhi Jiang, Xinpeng Li, Mingyan Han, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Juan Marín-Vega, Michael Sloth, Peter Schneider-Kamp, Richard Röttger, Chunyang Li, Long Bao, Gang He, Ziyao Xu, Li Xu, Gen Zhan, Ming Sun, Xing Wen, Junlin Li, Shuang Feng, Fei Lei, Rui Liu, Junxiang Ruan, Tianhong Dai, Wei Li, Zhan Lu, Hengyan Liu, Peian Huang, Guangyu Ren, Yonglin Luo, Chang Liu, Qiang Tu, Fangya Li, Ruipeng Gang, Chenghua Li, Jinjing Li, Sai Ma, Chenming Liu, Yizhen Cao, Steven Tel, Barthelemy Heyrman, Dominique Ginhac, Chul Lee, Gahyeon Kim, Seonghyun Park, An Gia Vien, Truong Thanh Nhat Mai, Howoon Yoon, Tu Vo, Alexander Holston, Sheir Zaheer, Chan Y. Park
The challenge is composed of two tracks with an emphasis on fidelity and complexity constraints: In Track 1, participants are asked to optimize objective fidelity scores while imposing a low-complexity constraint (i. e. solutions can not exceed a given number of operations).
no code implementations • 22 Mar 2022 • Meng Feng, Chieh-Chi Kao, Qingming Tang, Ming Sun, Viktor Rozgic, Spyros Matsoukas, Chao Wang
Standard acoustic event classification (AEC) solutions require large-scale collection of data from client devices for model optimization.
1 code implementation • ICCV 2021 • BoYu Chen, Peixia Li, Baopu Li, Chen Lin, Chuming Li, Ming Sun, Junjie Yan, Wanli Ouyang
We present BN-NAS, neural architecture search with Batch Normalization (BN-NAS), to accelerate neural architecture search (NAS).
no code implementations • 7 Aug 2021 • BoYu Chen, Peixia Li, Baopu Li, Chuming Li, Lei Bai, Chen Lin, Ming Sun, Junjie Yan, Wanli Ouyang
Then, a compact set of the possible combinations for different token pooling and attention sharing mechanisms are constructed.
2 code implementations • ICCV 2021 • BoYu Chen, Peixia Li, Chuming Li, Baopu Li, Lei Bai, Chen Lin, Ming Sun, Junjie Yan, Wanli Ouyang
We introduce the first Neural Architecture Search (NAS) method to find a better transformer architecture for image recognition.
Ranked #548 on
Image Classification
on ImageNet
no code implementations • 28 May 2021 • Ming Sun, Haoxuan Dou, Baopu Li, Lei Cui, Junjie Yan, Wanli Ouyang
Data sampling acts as a pivotal role in training deep learning models.
no code implementations • ECCV 2020 • Ming Sun, Haoxuan Dou, Junjie Yan
Transfer learning can boost the performance on the targettask by leveraging the knowledge of the source domain.
no code implementations • 5 Feb 2021 • Ho-Hsiang Wu, Chieh-Chi Kao, Qingming Tang, Ming Sun, Brian McFee, Juan Pablo Bello, Chao Wang
Deep learning is very data hungry, and supervised learning especially requires massive labeled data to work well.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+7
1 code implementation • CVPR 2021 • Jie Liu, Chuming Li, Feng Liang, Chen Lin, Ming Sun, Junjie Yan, Wanli Ouyang, Dong Xu
To develop a practical method for learning complex inception convolution based on the data, a simple but effective search algorithm, referred to as efficient dilation optimization (EDO), is developed.
1 code implementation • 12 Dec 2020 • Matthieu Lin, Chuming Li, Xingyuan Bu, Ming Sun, Chen Lin, Junjie Yan, Wanli Ouyang, Zhidong Deng
Furthermore, the bipartite match of ED harms the training efficiency due to the large ground truth number in crowd scenes.
1 code implementation • ICCV 2021 • Yuanzheng Ci, Chen Lin, Ming Sun, BoYu Chen, Hongwen Zhang, Wanli Ouyang
The automation of neural architecture design has been a coveted alternative to human experts.
no code implementations • 21 Oct 2020 • Jie Liu, Chen Lin, Chuming Li, Lu Sheng, Ming Sun, Junjie Yan, Wanli Ouyang
Several variants of stochastic gradient descent (SGD) have been proposed to improve the learning effectiveness and efficiency when training deep neural networks, among which some recent influential attempts would like to adaptively control the parameter-wise learning rate (e. g., Adam and RMSProp).
no code implementations • 13 Oct 2020 • Yixin Gao, Noah D. Stein, Chieh-Chi Kao, Yunliang Cai, Ming Sun, Tao Zhang, Shiv Vitaladevuni
Since the WW model is trained with the AFE-processed audio data, its performance is sensitive to AFE variations, such as gain changes.
1 code implementation • NeurIPS 2020 • Keyu Tian, Chen Lin, Ming Sun, Luping Zhou, Junjie Yan, Wanli Ouyang
On CIFAR-10, we achieve a top-1 error rate of 1. 24%, which is currently the best performing single model without extra training data.
no code implementations • 28 Sep 2020 • Mingzhu Shen, Feng Liang, Chuming Li, Chen Lin, Ming Sun, Junjie Yan, Wanli Ouyang
Automatic search of Quantized Neural Networks (QNN) has attracted a lot of attention.
no code implementations • ECCV 2020 • Ronghao Guo, Chen Lin, Chuming Li, Keyu Tian, Ming Sun, Lu Sheng, Junjie Yan
Specifically, the difficulties for architecture searching in such a complex space has been eliminated by the proposed stabilized share-parameter proxy, which employs Stochastic Gradient Langevin Dynamics to enable fast shared parameter sampling, so as to achieve stabilized measurement of architecture performance even in search space with complex topological structures.
no code implementations • CVPR 2020 • Junran Peng, Xingyuan Bu, Ming Sun, Zhao-Xiang Zhang, Tieniu Tan, Junjie Yan
Training with more data has always been the most stable and effective way of improving performance in deep learning era.
no code implementations • 3 Apr 2020 • Gerrit Schellenberger, Laurence P. David, Jan Vrtilek, Ewan O'Sullivan, Jeremy Lim, William Forman, Ming Sun, Francoise Combes, Philippe Salome, Christine Jones, Simona Giacintucci, Alastair Edge, Fabio Gastaldello, Pasquale Temi, Fabrizio Brighenti, Sandro Bardelli
This indicates that the two giant molecular clouds seen in absorption are most likely within the sphere of influence of the supermassive black hole.
Astrophysics of Galaxies
no code implementations • 21 Feb 2020 • Bowen Shi, Ming Sun, Krishna C. Puvvada, Chieh-Chi Kao, Spyros Matsoukas, Chao Wang
We study few-shot acoustic event detection (AED) in this paper.
no code implementations • ICLR 2020 • Feng Liang, Chen Lin, Ronghao Guo, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang
However, classification allocation pattern is usually adopted directly to object detector, which is proved to be sub-optimal.
no code implementations • NeurIPS 2019 • Junran Peng, Ming Sun, Zhao-Xiang Zhang, Tieniu Tan, Junjie Yan
Instead of searching and constructing an entire network, NATS explores the architecture space on the base of existing network and reusing its weights.
no code implementations • CVPR 2020 • Xiang Li, Chen Lin, Chuming Li, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang
In this paper, we analyse existing weight sharing one-shot NAS approaches from a Bayesian point of view and identify the posterior fading problem, which compromises the effectiveness of shared weights.
1 code implementation • 16 Sep 2019 • Weimin Wang, Weiran Wang, Ming Sun, Chao Wang
Acoustic Scene Classification (ASC) is a challenging task, as a single scene may involve multiple events that contain complex sound patterns.
no code implementations • 5 Sep 2019 • Junran Peng, Ming Sun, Zhao-Xiang Zhang, Tieniu Tan, Junjie Yan
With the combination of these two designs, an architecture transformation scheme could be discovered to adapt a network designed for image classification to task of object detection.
no code implementations • ICCV 2019 • Junran Peng, Ming Sun, Zhao-Xiang Zhang, Tieniu Tan, Junjie Yan
Scale-sensitive object detection remains a challenging task, where most of the existing methods could not learn it explicitly and are not robust to scale variance.
no code implementations • 1 Jul 2019 • Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang
Acoustic Event Detection (AED), aiming at detecting categories of events based on audio signals, has found application in many intelligent systems.
no code implementations • NAACL 2019 • Courtney Mansfield, Ming Sun, Yuzong Liu, G, Ankur he, Bj{\"o}rn Hoffmeister
We find subword models with additional linguistic features yield the best performance (with a word error rate of 0. 17{\%}).
no code implementations • NIPS Workshop CDNNRIA 2018 • Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang
In this paper, we present a compression approach based on the combination of low-rank matrix factorization and quantization training, to reduce complexity for neural network based acoustic event detection (AED) models.
no code implementations • 29 Apr 2019 • Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang
This paper presents our work of training acoustic event detection (AED) models using unlabeled dataset.
2 code implementations • NeurIPS 2018 • Kaiyu Yue, Ming Sun, Yuchen Yuan, Feng Zhou, Errui Ding, Fuxin Xu
The non-local module is designed for capturing long-range spatio-temporal dependencies in images and videos.
no code implementations • 19 Oct 2018 • Yuriy Mishchenko, Yusuf Goren, Ming Sun, Chris Beauchene, Spyros Matsoukas, Oleg Rybakov, Shiv Naga Prasad Vitaladevuni
We investigate low-bit quantization to reduce computational cost of deep neural network (DNN) based keyword spotting (KWS).
1 code implementation • 1 Aug 2018 • Grant R. Tremblay, Françoise Combes, J. B. Raymond Oonk, Helen R. Russell, Michael A. McDonald, Massimo Gaspari, Bernd Husemann, Paul E. J. Nulsen, Brian R. McNamara, Stephen L. Hamer, Christopher P. O'Dea, Stefi A. Baum, Timothy A. Davis, Megan Donahue, G. Mark Voit, Alastair C. Edge, Elizabeth L. Blanton, Malcolm N. Bremer, Esra Bulbul, Tracy E. Clarke, Laurence P. David, Louise O. V. Edwards, Dominic A. Eggerman, Andrew C. Fabian, William R. Forman, Christine Jones, Nathaniel Kerman, Ralph P. Kraft, Yuan Li, Meredith C. Powell, Scott W. Randall, Philippe Salomé, Aurora Simionescu, Yuanyuan Su, Ming Sun, C. Megan Urry, Adrian N. Vantyghem, Belinda J. Wilkes, John A. ZuHone
The entire scenario is therefore consistent with a galaxy-spanning "fountain", wherein cold gas clouds drain into the black hole accretion reservoir, powering jets and bubbles that uplift a cooling plume of low-entropy multiphase gas, which may stimulate additional cooling and accretion as part of a self-regulating feedback loop.
Astrophysics of Galaxies
1 code implementation • ECCV 2018 • Ming Sun, Yuchen Yuan, Feng Zhou, Errui Ding
Attention-based learning for fine-grained image recognition remains a challenging task, where most of the existing methods treat each object part in isolation, while neglecting the correlations among them.
Ranked #67 on
Fine-Grained Image Classification
on Stanford Cars
no code implementations • 5 May 2017 • Ming Sun, Anirudh Raju, George Tucker, Sankaran Panchapagesan, Geng-Shen Fu, Arindam Mandal, Spyros Matsoukas, Nikko Strom, Shiv Vitaladevuni
Finally, the max-pooling loss trained LSTM initialized with a cross-entropy pre-trained network shows the best performance, which yields $67. 6\%$ relative reduction compared to baseline feed-forward DNN in Area Under the Curve (AUC) measure.
no code implementations • 5 Feb 2017 • Chunxi Liu, Jinyi Yang, Ming Sun, Santosh Kesiraju, Alena Rott, Lucas Ondel, Pegah Ghahremani, Najim Dehak, Lukas Burget, Sanjeev Khudanpur
Acoustic unit discovery (AUD) is a process of automatically identifying a categorical acoustic unit inventory from speech and producing corresponding acoustic unit tokenizations.
no code implementations • LREC 2016 • Ming Sun, Yun-Nung Chen, Zhenhao Hua, Yulian Tamres-Rudnicky, Arnab Dash, Alex Rudnicky, er
Users will interact with an individual app on smart devices (e. g., phone, TV, car) to fulfill a specific goal (e. g. find a photographer), but users may also pursue more complex tasks that will span multiple domains and apps (e. g. plan a wedding ceremony).
no code implementations • 30 Apr 2013 • Cencheng Shen, Ming Sun, Minh Tang, Carey E. Priebe
For multiple multivariate data sets, we derive conditions under which Generalized Canonical Correlation Analysis (GCCA) improves classification performance of the projected datasets, compared to standard Canonical Correlation Analysis (CCA) using only two data sets.