Search Results for author: Mu Li

Found 110 papers, 45 papers with code

Tencent Translation System for the WMT21 News Translation Task

no code implementations WMT (EMNLP) 2021 Longyue Wang, Mu Li, Fangxu Liu, Shuming Shi, Zhaopeng Tu, Xing Wang, Shuangzhi Wu, Jiali Zeng, Wen Zhang

Based on our success in the last WMT, we continuously employed advanced techniques such as large batch training, data selection and data filtering.

Data Augmentation Translation

Recurrent Attention for Neural Machine Translation

1 code implementation EMNLP 2021 Jiali Zeng, Shuangzhi Wu, Yongjing Yin, Yufan Jiang, Mu Li

Across an extensive set of experiments on 10 machine translation tasks, we find that RAN models are competitive and outperform their Transformer counterpart in certain scenarios, with fewer parameters and inference time.

Machine Translation NMT +1

PreDiff: Precipitation Nowcasting with Latent Diffusion Models

no code implementations19 Jul 2023 Zhihan Gao, Xingjian Shi, Boran Han, Hao Wang, Xiaoyong Jin, Danielle Maddix, Yi Zhu, Mu Li, Yuyang Wang

We conduct empirical studies on two datasets: N-body MNIST, a synthetic dataset with chaotic behavior, and SEVIR, a real-world precipitation nowcasting dataset.


Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation

1 code implementation16 May 2023 Yuxin Ren, Zihan Zhong, Xingjian Shi, Yi Zhu, Chun Yuan, Mu Li

It has been commonly observed that a teacher model with superior performance does not necessarily result in a stronger student, highlighting a discrepancy between current teacher training practices and effective knowledge transfer.

Knowledge Distillation text-classification +2

XTab: Cross-table Pretraining for Tabular Transformers

1 code implementation10 May 2023 Bingzhao Zhu, Xingjian Shi, Nick Erickson, Mu Li, George Karypis, Mahsa Shoaran

The success of self-supervised learning in computer vision and natural language processing has motivated pretraining methods on tabular data.

AutoML Federated Learning +1

Scanpath Prediction in Panoramic Videos via Expected Code Length Minimization

no code implementations4 May 2023 Mu Li, Kanglong Fan, Kede Ma

Predicting human scanpaths when exploring panoramic videos is a challenging task due to the spherical geometry and the multimodality of the input, and the inherent uncertainty and diversity of the output.

Data Compression Imitation Learning +1

A Cheaper and Better Diffusion Language Model with Soft-Masked Noise

1 code implementation10 Apr 2023 Jiaao Chen, Aston Zhang, Mu Li, Alex Smola, Diyi Yang

Diffusion models that are based on iterative denoising have been recently proposed and leveraged in various generation tasks like image generation.

Denoising Image Generation +1

LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation

no code implementations16 Feb 2023 Jiaxin Cheng, Xiao Liang, Xingjian Shi, Tong He, Tianjun Xiao, Mu Li

Layout-to-image generation refers to the task of synthesizing photo-realistic images based on semantic layouts.

Layout-to-Image Generation

AIM: Adapting Image Models for Efficient Video Action Recognition

1 code implementation6 Feb 2023 Taojiannan Yang, Yi Zhu, Yusheng Xie, Aston Zhang, Chen Chen, Mu Li

Recent vision transformer based video models mostly follow the ``image pre-training then finetuning" paradigm and have achieved great success on multiple video benchmarks.

 Ranked #1 on Action Recognition on Diving-48 (using extra training data)

Action Classification Action Recognition +2

Multimodal Chain-of-Thought Reasoning in Language Models

2 code implementations2 Feb 2023 Zhuosheng Zhang, Aston Zhang, Mu Li, Hai Zhao, George Karypis, Alex Smola

Large language models (LLMs) have shown impressive performance on complex reasoning by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains as the rationale to infer the answer.

Language Modelling Science Question Answering

Parameter-Efficient Fine-Tuning Design Spaces

no code implementations4 Jan 2023 Jiaao Chen, Aston Zhang, Xingjian Shi, Mu Li, Alex Smola, Diyi Yang

We discover the following design patterns: (i) group layers in a spindle pattern; (ii) allocate the number of trainable parameters to layers uniformly; (iii) tune all the groups; (iv) assign proper tuning strategies to different groups.

Learning Multimodal Data Augmentation in Feature Space

1 code implementation29 Dec 2022 Zichang Liu, Zhiqiang Tang, Xingjian Shi, Aston Zhang, Mu Li, Anshumali Shrivastava, Andrew Gordon Wilson

The ability to jointly learn from multiple modalities, such as text, audio, and visual data, is a defining feature of intelligent systems.

Data Augmentation Image Classification +1

What Makes for Good Tokenizers in Vision Transformer?

no code implementations21 Dec 2022 Shengju Qian, Yi Zhu, Wenbo Li, Mu Li, Jiaya Jia

The architecture of transformers, which recently witness booming applications in vision tasks, has pivoted against the widespread convolutional paradigm.

SPT: Semi-Parametric Prompt Tuning for Multitask Prompted Learning

no code implementations21 Dec 2022 M Saiful Bari, Aston Zhang, Shuai Zheng, Xingjian Shi, Yi Zhu, Shafiq Joty, Mu Li

Pre-trained large language models can efficiently interpolate human-written prompts in a natural way.

Language Modelling

Visual Prompt Tuning for Test-time Domain Adaptation

no code implementations10 Oct 2022 Yunhe Gao, Xingjian Shi, Yi Zhu, Hao Wang, Zhiqiang Tang, Xiong Zhou, Mu Li, Dimitris N. Metaxas

First, DePT plugs visual prompts into the vision Transformer and only tunes these source-initialized prompts during adaptation.

Unsupervised Domain Adaptation

Automatic Chain of Thought Prompting in Large Language Models

4 code implementations7 Oct 2022 Zhuosheng Zhang, Aston Zhang, Mu Li, Alex Smola

Providing these steps for prompting demonstrations is called chain-of-thought (CoT) prompting.

An Efficient Coarse-to-Fine Facet-Aware Unsupervised Summarization Framework based on Semantic Blocks

1 code implementation COLING 2022 Xinnian Liang, Jing Li, Shuangzhi Wu, Jiali Zeng, Yufan Jiang, Mu Li, Zhoujun Li

To tackle this problem, in this paper, we proposed an efficient Coarse-to-Fine Facet-Aware Ranking (C2F-FAR) framework for unsupervised long document summarization, which is based on the semantic block.

Document Summarization

Earthformer: Exploring Space-Time Transformers for Earth System Forecasting

1 code implementation12 Jul 2022 Zhihan Gao, Xingjian Shi, Hao Wang, Yi Zhu, Yuyang Wang, Mu Li, Dit-yan Yeung

With the explosive growth of the spatiotemporal Earth observation data in the past decade, data-driven models that apply Deep Learning (DL) are demonstrating impressive potential for various Earth system forecasting tasks.

Earth Surface Forecasting Weather Forecasting

Removing Batch Normalization Boosts Adversarial Training

1 code implementation4 Jul 2022 Haotao Wang, Aston Zhang, Shuai Zheng, Xingjian Shi, Mu Li, Zhangyang Wang

In addition, NoFrost achieves a $23. 56\%$ adversarial robustness against PGD attack, which improves the $13. 57\%$ robustness in BN-based AT.

Adversarial Robustness

Perceptual Quality Assessment of Virtual Reality Videos in the Wild

1 code implementation13 Jun 2022 Wen Wen, Mu Li, Yiru Yao, Xiangjie Sui, Yabin Zhang, Long Lan, Yuming Fang, Kede Ma

Investigating how people perceive virtual reality videos in the wild (\ie, those captured by everyday users) is a crucial and challenging task in VR-related applications due to complex \textit{authentic} distortions localized in space and time.

Saliency Detection Video Quality Assessment

Modeling Multi-Granularity Hierarchical Features for Relation Extraction

1 code implementation NAACL 2022 Xinnian Liang, Shuangzhi Wu, Mu Li, Zhoujun Li

In this paper, we propose a novel method to extract multi-granularity features based solely on the original input sentences.

Relation Extraction

Learning Confidence for Transformer-based Neural Machine Translation

1 code implementation ACL 2022 Yu Lu, Jiali Zeng, Jiajun Zhang, Shuangzhi Wu, Mu Li

Confidence estimation aims to quantify the confidence of the model prediction, providing an expectation of success.

Machine Translation NMT +1

Task-guided Disentangled Tuning for Pretrained Language Models

1 code implementation Findings (ACL) 2022 Jiali Zeng, Yufan Jiang, Shuangzhi Wu, Yongjing Yin, Mu Li

Pretrained language models (PLMs) trained on large-scale unlabeled corpus are typically fine-tuned on task-specific downstream datasets, which have produced state-of-the-art results on various NLP tasks.

Pseudocylindrical Convolutions for Learned Omnidirectional Image Compression

1 code implementation25 Dec 2021 Mu Li, Kede Ma, Jinxing Li, David Zhang

We first describe parametric pseudocylindrical representation as a generalization of common pseudocylindrical map projections.

Image Compression

Benchmarking Multimodal AutoML for Tabular Data with Text Fields

1 code implementation4 Nov 2021 Xingjian Shi, Jonas Mueller, Nick Erickson, Mu Li, Alexander J. Smola

We consider the use of automated supervised learning systems for data tables that not only contain numeric/categorical columns, but one or more text fields as well.

AutoML Benchmarking

Blending Anti-Aliasing into Vision Transformer

no code implementations NeurIPS 2021 Shengju Qian, Hao Shao, Yi Zhu, Mu Li, Jiaya Jia

In this work, we analyze the uncharted problem of aliasing in vision transformer and explore to incorporate anti-aliasing properties.

Distiller: A Systematic Study of Model Distillation Methods in Natural Language Processing

no code implementations EMNLP (sustainlp) 2021 Haoyu He, Xingjian Shi, Jonas Mueller, Zha Sheng, Mu Li, George Karypis

We aim to identify how different components in the KD pipeline affect the resulting performance and how much the optimal KD pipeline varies across different datasets/tasks, such as the data augmentation policy, the loss function, and the intermediate representation for transferring the knowledge between teacher and student.

Data Augmentation Hyperparameter Optimization

Unsupervised Keyphrase Extraction by Jointly Modeling Local and Global Context

1 code implementation EMNLP 2021 Xinnian Liang, Shuangzhi Wu, Mu Li, Zhoujun Li

In terms of the local view, we first build a graph structure based on the document where phrases are regarded as vertices and the edges are similarities between vertices.

Document Embedding Keyphrase Extraction

Attention Calibration for Transformer in Neural Machine Translation

no code implementations ACL 2021 Yu Lu, Jiali Zeng, Jiajun Zhang, Shuangzhi Wu, Mu Li

Attention mechanisms have achieved substantial improvements in neural machine translation by dynamically selecting relevant inputs for different predictions.

Machine Translation Translation

A Unified Efficient Pyramid Transformer for Semantic Segmentation

no code implementations29 Jul 2021 Fangrui Zhu, Yi Zhu, Li Zhang, Chongruo wu, Yanwei Fu, Mu Li

Semantic segmentation is a challenging problem due to difficulties in modeling context in complex scenes and class confusions along boundaries.

Semantic Segmentation

Dive into Deep Learning

1 code implementation21 Jun 2021 Aston Zhang, Zachary C. Lipton, Mu Li, Alexander J. Smola

This open-source book represents our attempt to make deep learning approachable, teaching readers the concepts, the context, and the code.

Multi-Domain Recommender Systems

Multimodal AutoML on Structured Tables with Text Fields

2 code implementations ICML Workshop AutoML 2021 Xingjian Shi, Jonas Mueller, Nick Erickson, Mu Li, Alex Smola

We design automated supervised learning systems for data tables that not only contain numeric/categorical columns, but text fields as well.


CrossNorm and SelfNorm for Generalization under Distribution Shifts

1 code implementation ICCV 2021 Zhiqiang Tang, Yunhe Gao, Yi Zhu, Zhi Zhang, Mu Li, Dimitris Metaxas

Can we develop new normalization methods to improve generalization robustness under distribution shifts?

Unity of Opposites: SelfNorm and CrossNorm for Model Robustness

no code implementations1 Jan 2021 Zhiqiang Tang, Yunhe Gao, Yi Zhu, Zhi Zhang, Mu Li, Dimitris N. Metaxas

CrossNorm exchanges styles between feature channels to perform style augmentation, diversifying the content and style mixtures.

Object Recognition Unity

Improving Machine Reading Comprehension with Single-choice Decision and Transfer Learning

no code implementations6 Nov 2020 Yufan Jiang, Shuangzhi Wu, Jing Gong, Yahui Cheng, Peng Meng, Weiliang Lin, Zhibo Chen, Mu Li

In addition, by transferring knowledge from other kinds of MRC tasks, our model achieves a new state-of-the-art results in both single and ensemble settings.

AutoML Binary Classification +2

FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems

no code implementations26 Aug 2020 Yuwei Hu, Zihao Ye, Minjie Wang, Jiali Yu, Da Zheng, Mu Li, Zheng Zhang, Zhiru Zhang, Yida Wang

FeatGraph provides a flexible programming interface to express diverse GNN models by composing coarse-grained sparse templates with fine-grained user-defined functions (UDFs) on each vertex/edge.

CSER: Communication-efficient SGD with Error Reset

no code implementations NeurIPS 2020 Cong Xie, Shuai Zheng, Oluwasanmi Koyejo, Indranil Gupta, Mu Li, Haibin Lin

The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks.

Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes

1 code implementation24 Jun 2020 Shuai Zheng, Haibin Lin, Sheng Zha, Mu Li

Using the proposed LANS method and the learning rate scheme, we scaled up the mini-batch sizes to 96K and 33K in phases 1 and 2 of BERT pretraining, respectively.

Natural Language Understanding

Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference

no code implementations4 Jun 2020 Haichen Shen, Jared Roesch, Zhi Chen, Wei Chen, Yong Wu, Mu Li, Vin Sharma, Zachary Tatlock, Yida Wang

Modern deep neural networks increasingly make use of features such as dynamic control flow, data structures and dynamic tensor shapes.

Machine learning formation enthalpies of intermetallics

1 code implementation26 May 2020 Zhaohan Zhang, Mu Li, Katharine Flores, Rohan Mishra

The model uses easily accessible elemental properties as descriptors and has a mean absolute error (MAE) of 0. 025 eV/atom in predicting the formation enthalpy of stable binary intermetallics reported in the Materials Project database.

Materials Science

Learning Context-Based Non-local Entropy Modeling for Image Compression

no code implementations10 May 2020 Mu Li, Kai Zhang, WangMeng Zuo, Radu Timofte, David Zhang

To address this issue, we propose a non-local operation for context modeling by employing the global similarity within the context.

Image Compression

Improving Semantic Segmentation via Self-Training

no code implementations30 Apr 2020 Yi Zhu, Zhongyue Zhang, Chongruo wu, Zhi Zhang, Tong He, Hang Zhang, R. Manmatha, Mu Li, Alexander Smola

In the case of semantic segmentation, this means that large amounts of pixelwise annotations are required to learn accurate models.

Domain Generalization Semantic Segmentation

AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

8 code implementations13 Mar 2020 Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li, Alexander Smola

We introduce AutoGluon-Tabular, an open-source AutoML framework that requires only a single line of Python to train highly accurate machine learning models on an unprocessed tabular dataset such as a CSV file.

Neural Architecture Search

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

4 code implementations9 Jul 2019 Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng, Yi Zhu

We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating).

Efficient and Effective Context-Based Convolutional Entropy Modeling for Image Compression

2 code implementations24 Jun 2019 Mu Li, Kede Ma, Jane You, David Zhang, WangMeng Zuo

For the former, we directly apply a CCN to the binarized representation of an image to compute the Bernoulli distribution of each code for entropy estimation.

Image Compression

Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources

2 code implementations26 Apr 2019 Haibin Lin, Hang Zhang, Yifei Ma, Tong He, Zhi Zhang, Sheng Zha, Mu Li

One difficulty we observe is that the noise in the stochastic momentum estimation is accumulated over time and will have delayed effects when the batch size changes.

Image Classification object-detection +3

Language Models with Transformers

1 code implementation arXiv 2019 Chenguang Wang, Mu Li, Alexander J. Smola

In this paper, we explore effective Transformer architectures for language model, including adding additional LSTM layers to better capture the sequential context while still keeping the computation efficient.

Ranked #2 on Language Modelling on Penn Treebank (Word Level) (using extra training data)

Language Modelling Neural Architecture Search

Learning Content-Weighted Deep Image Compression

1 code implementation1 Apr 2019 Mu Li, WangMeng Zuo, Shuhang Gu, Jane You, David Zhang

Learning-based lossy image compression usually involves the joint optimization of rate-distortion performance.

Image Compression

Bag of Tricks for Image Classification with Convolutional Neural Networks

25 code implementations CVPR 2019 Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li

Much of the recent progress made in image classification research can be credited to training procedure refinements, such as changes in data augmentations and optimization methods.

Domain Generalization General Classification +4

Bidirectional Generative Adversarial Networks for Neural Machine Translation

no code implementations CONLL 2018 Zhirui Zhang, Shujie Liu, Mu Li, Ming Zhou, Enhong Chen

To address this issue and stabilize the GAN training, in this paper, we propose a novel Bidirectional Generative Adversarial Network for Neural Machine Translation (BGAN-NMT), which aims to introduce a generator model to act as the discriminator, whereby the discriminator naturally considers the entire translation space so that the inadequate training problem can be alleviated.

Language Modelling Machine Translation +2

Approximate Distribution Matching for Sequence-to-Sequence Learning

no code implementations24 Aug 2018 Wenhu Chen, Guanlin Li, Shujie Liu, Zhirui Zhang, Mu Li, Ming Zhou

Then, we interpret sequence-to-sequence learning as learning a transductive model to transform the source local latent distributions to match their corresponding target distributions.

Image Captioning Machine Translation +1

Style Transfer as Unsupervised Machine Translation

no code implementations23 Aug 2018 Zhirui Zhang, Shuo Ren, Shujie Liu, Jianyong Wang, Peng Chen, Mu Li, Ming Zhou, Enhong Chen

Language style transferring rephrases text with specific stylistic attributes while preserving the original attribute-independent content.

NMT Style Transfer +2

Regularizing Neural Machine Translation by Target-bidirectional Agreement

no code implementations13 Aug 2018 Zhirui Zhang, Shuangzhi Wu, Shujie Liu, Mu Li, Ming Zhou, Tong Xu

Although Neural Machine Translation (NMT) has achieved remarkable progress in the past several years, most NMT systems still suffer from a fundamental shortcoming as in other sequence generation tasks: errors made early in generation process are fed as inputs to the model and can be quickly amplified, harming subsequent sequence generation.

Machine Translation NMT +1

Generative Bridging Network for Neural Sequence Prediction

no code implementations NAACL 2018 Wenhu Chen, Guanlin Li, Shuo Ren, Shujie Liu, Zhirui Zhang, Mu Li, Ming Zhou

In order to alleviate data sparsity and overfitting problems in maximum likelihood estimation (MLE) for sequence prediction tasks, we propose the Generative Bridging Network (GBN), in which a novel bridge module is introduced to assist the training of the sequence prediction model (the generator network).

Abstractive Text Summarization Image Captioning +5

Triangular Architecture for Rare Language Translation

no code implementations ACL 2018 Shuo Ren, Wenhu Chen, Shujie Liu, Mu Li, Ming Zhou, Shuai Ma

Neural Machine Translation (NMT) performs poor on the low-resource language pair $(X, Z)$, especially when $Z$ is a rare language.

Machine Translation NMT +1

Joint Training for Neural Machine Translation Models with Monolingual Data

no code implementations1 Mar 2018 Zhirui Zhang, Shujie Liu, Mu Li, Ming Zhou, Enhong Chen

Monolingual data have been demonstrated to be helpful in improving translation quality of both statistical machine translation (SMT) systems and neural machine translation (NMT) systems, especially in resource-poor or domain adaptation tasks where parallel data are not rich enough.

Domain Adaptation Machine Translation +2

Shift-Net: Image Inpainting via Deep Feature Rearrangement

2 code implementations ECCV 2018 Zhaoyi Yan, Xiaoming Li, Mu Li, WangMeng Zuo, Shiguang Shan

To this end, the encoder feature of the known region is shifted to serve as an estimation of the missing parts.

Image Inpainting

Enlarging Context with Low Cost: Efficient Arithmetic Coding with Trimmed Convolution

no code implementations15 Jan 2018 Mu Li, Shuhang Gu, David Zhang, WangMeng Zuo

One key issue of arithmetic encoding method is to predict the probability of the current coding symbol from its context, i. e., the preceding encoded symbols, which usually can be executed by building a look-up table (LUT).

Image Compression

Stack-based Multi-layer Attention for Transition-based Dependency Parsing

no code implementations EMNLP 2017 Zhirui Zhang, Shujie Liu, Mu Li, Ming Zhou, Enhong Chen

Although sequence-to-sequence (seq2seq) network has achieved significant success in many NLP tasks such as machine translation and text summarization, simply applying this approach to transition-based dependency parsing cannot yield a comparable performance gain as in other state-of-the-art methods, such as stack-LSTM and head selection.

Language Modelling Machine Translation +3

Sequence-to-Dependency Neural Machine Translation

no code implementations ACL 2017 Shuangzhi Wu, Dong-dong Zhang, Nan Yang, Mu Li, Ming Zhou

Nowadays a typical Neural Machine Translation (NMT) model generates translations from left to right as a linear sequence, during which latent syntactic structures of the target sentences are not explicitly concerned.

Machine Translation NMT +1

Generative Bridging Network in Neural Sequence Prediction

no code implementations28 Jun 2017 Wenhu Chen, Guanlin Li, Shuo Ren, Shujie Liu, Zhirui Zhang, Mu Li, Ming Zhou

In order to alleviate data sparsity and overfitting problems in maximum likelihood estimation (MLE) for sequence prediction tasks, we propose the Generative Bridging Network (GBN), in which a novel bridge module is introduced to assist the training of the sequence prediction model (the generator network).

Abstractive Text Summarization Language Modelling +2

Learning Convolutional Networks for Content-weighted Image Compression

1 code implementation CVPR 2018 Mu Li, WangMeng Zuo, Shuhang Gu, Debin Zhao, David Zhang

Therefore, the encoder, decoder, binarizer and importance map can be jointly optimized in an end-to-end manner by using a subset of the ImageNet database.

Binarization Image Compression +1

Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation

no code implementations COLING 2016 Shi Feng, Shujie Liu, Nan Yang, Mu Li, Ming Zhou, Kenny Q. Zhu

In neural machine translation, the attention mechanism facilitates the translation process by producing a soft alignment between the source sentence and the target sentence.

Machine Translation Translation

Deep Identity-aware Transfer of Facial Attributes

no code implementations18 Oct 2016 Mu Li, WangMeng Zuo, David Zhang

In general, our model consists of a mask network and an attribute transform network which work in synergy to generate a photo-realistic facial image with the reference attribute.

Denoising Face Hallucination +1

Convolutional Network for Attribute-driven and Identity-preserving Human Face Generation

no code implementations23 Aug 2016 Mu Li, WangMeng Zuo, David Zhang

Here we address this problem from the view of optimization, and suggest an optimization model to generate human face with the given attributes while keeping the identity of the reference image.

Face Generation

On the Powerball Method for Optimization

no code implementations24 Mar 2016 Ye Yuan, Mu Li, Jun Liu, Claire J. Tomlin

We propose a new method to accelerate the convergence of optimization algorithms.

Revise Saturated Activation Functions

no code implementations18 Feb 2016 Bing Xu, Ruitong Huang, Mu Li

In this paper, we revise two commonly used saturated functions, the logistic sigmoid and the hyperbolic tangent (tanh).

Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model

no code implementations13 Jan 2016 Shi Feng, Shujie Liu, Mu Li, Ming Zhou

Aiming to resolve these problems, we propose new variations of attention-based encoder-decoder and compare them with other models on machine translation.

Image Captioning Machine Translation +4

Data Driven Resource Allocation for Distributed Learning

no code implementations15 Dec 2015 Travis Dick, Mu Li, Venkata Krishna Pillutla, Colin White, Maria Florina Balcan, Alex Smola

In distributed machine learning, data is dispatched to multiple machines for processing.

MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems

2 code implementations3 Dec 2015 Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, Zheng Zhang

This paper describes both the API design and the system implementation of MXNet, and explains how embedding of both symbolic expression and tensor operation is handled in a unified fashion.

BIG-bench Machine Learning Clustering +2

High Performance Latent Variable Models

no code implementations21 Oct 2015 Aaron Q. Li, Amr Ahmed, Mu Li, Vanja Josifovski

Latent variable models have accumulated a considerable amount of interest from the industry and academia for their versatility in a wide range of applications.

Vocal Bursts Intensity Prediction

AdaDelay: Delay Adaptive Distributed Stochastic Convex Optimization

no code implementations20 Aug 2015 Suvrit Sra, Adams Wei Yu, Mu Li, Alexander J. Smola

We study distributed stochastic convex optimization under the delayed gradient model where the server nodes perform parameter updates, while the worker nodes compute stochastic gradients.

Graph Partitioning via Parallel Submodular Approximation to Accelerate Distributed Machine Learning

no code implementations18 May 2015 Mu Li, Dave G. Andersen, Alexander J. Smola

Distributed computing excels at processing large scale data, but the communication cost for synchronizing the shared parameters may slow down the overall performance.

BIG-bench Machine Learning Distributed Computing +1

Empirical Evaluation of Rectified Activations in Convolutional Network

2 code implementations5 May 2015 Bing Xu, Naiyan Wang, Tianqi Chen, Mu Li

In this paper we investigate the performance of different types of rectified activation functions in convolutional neural network: standard rectified linear unit (ReLU), leaky rectified linear unit (Leaky ReLU), parametric rectified linear unit (PReLU) and a new randomized leaky rectified linear units (RReLU).

General Classification Image Classification

Beyond Word-based Language Model in Statistical Machine Translation

no code implementations5 Feb 2015 Jiajun Zhang, Shujie Liu, Mu Li, Ming Zhou, Cheng-qing Zong

Language model is one of the most important modules in statistical machine translation and currently the word-based language model dominants this community.

Language Modelling Machine Translation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.