Search Results for author: Ce Zhang

Found 133 papers, 56 papers with code

Don't Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TinyScript

no code implementations ICML 2020 Fangcheng Fu, Yuzheng Hu, Yihan He, Jiawei Jiang, Yingxia Shao, Ce Zhang, Bin Cui

Recent years have witnessed intensive research interests on training deep neural networks (DNNs) more efficiently by quantization-based compression methods, which facilitate DNNs training in two ways: (1) activations are quantized to shrink the memory consumption, and (2) gradients are quantized to decrease the communication cost.

Quantization

Data Debugging with Shapley Importance over End-to-End Machine Learning Pipelines

1 code implementation23 Apr 2022 Bojan Karlaš, David Dao, Matteo Interlandi, Bo Li, Sebastian Schelter, Wentao Wu, Ce Zhang

We present DataScope (ease. ml/datascope), the first system that efficiently computes Shapley values of training examples over an end-to-end ML pipeline, and illustrate its applications in data debugging for ML training.

Fairness

Modelling graph dynamics in fraud detection with "Attention"

1 code implementation22 Apr 2022 Susie Xi Rao, Clémence Lanfranchi, Shuai Zhang, Zhichao Han, Zitao Zhang, Wei Min, Mo Cheng, Yinan Shan, Yang Zhao, Ce Zhang

At online retail platforms, detecting fraudulent accounts and transactions is crucial to improve customer experience, minimize loss, and avoid unauthorized transactions.

Fraud Detection

SHiFT: An Efficient, Flexible Search Engine for Transfer Learning

1 code implementation4 Apr 2022 Cedric Renggli, Xiaozhe Yao, Luka Kolar, Luka Rimanic, Ana Klimovic, Ce Zhang

Transfer learning can be seen as a data- and compute-efficient alternative to training models from scratch.

Transfer Learning

A Quality Index Metric and Method for Online Self-Assessment of Autonomous Vehicles Sensory Perception

1 code implementation4 Mar 2022 Ce Zhang, Azim Eskandarian

We propose an evaluation metric, namely the perception quality index (PQI), to assess the camera-based object detection algorithm performance and provide the perception quality feedback frame by frame.

Autonomous Driving Frame +2

Certifying Out-of-Domain Generalization for Blackbox Functions

no code implementations3 Feb 2022 Maurice Weber, Linyi Li, Boxin Wang, Zhikuan Zhao, Bo Li, Ce Zhang

As a result, the wider application of these techniques is currently limited by its scalability and flexibility -- these techniques often do not scale to large-scale datasets with modern deep neural networks or cannot handle loss functions which may be non-smooth, such as the 0-1 loss.

Domain Generalization

ReforesTree: A Dataset for Estimating Tropical Forest Carbon Stock with Deep Learning and Aerial Imagery

no code implementations26 Jan 2022 Gyri Reiersen, David Dao, Björn Lütjens, Konstantin Klemmer, Kenza Amara, Attila Steinegger, Ce Zhang, Xiaoxiang Zhu

The potential for impact and scale of leveraging advancements in machine learning and remote sensing technologies is promising but needs to be of high quality in order to replace the current forest stock protocols for certifications.

Hyper-Tune: Towards Efficient Hyper-parameter Tuning at Scale

1 code implementation18 Jan 2022 Yang Li, Yu Shen, Huaijun Jiang, Wentao Zhang, Jixiang Li, Ji Liu, Ce Zhang, Bin Cui

The ever-growing demand and complexity of machine learning are putting pressure on hyper-parameter tuning systems: while the evaluation cost of models continues to increase, the scalability of state-of-the-arts starts to become a crucial bottleneck.

Dynamic Human Evaluation for Relative Model Comparisons

1 code implementation15 Dec 2021 Thórhildur Thorleiksdóttir, Cedric Renggli, Nora Hollenstein, Ce Zhang

Collecting human judgements is currently the most reliable evaluation method for natural language generation systems.

Text Generation

Persia: An Open, Hybrid System Scaling Deep Learning-based Recommenders up to 100 Trillion Parameters

1 code implementation10 Nov 2021 Xiangru Lian, Binhang Yuan, XueFeng Zhu, Yulong Wang, Yongjun He, Honghuan Wu, Lei Sun, Haodong Lyu, Chengjun Liu, Xing Dong, Yiqiao Liao, Mingnan Luo, Congfei Zhang, Jingru Xie, Haonan Li, Lei Chen, Renjie Huang, Jianying Lin, Chengchun Shu, Xuezhong Qiu, Zhishan Liu, Dongying Kong, Lei Yuan, Hai Yu, Sen yang, Ce Zhang, Ji Liu

Specifically, in order to ensure both the training efficiency and the training accuracy, we design a novel hybrid training algorithm, where the embedding layer and the dense neural network are handled by different synchronization mechanisms; then we build a system called Persia (short for parallel recommendation training system with hybrid acceleration) to support this hybrid training algorithm.

Recommendation Systems

iFlood: A Stable and Effective Regularizer

no code implementations ICLR 2022 Yuexiang Xie, Zhen Wang, Yaliang Li, Ce Zhang, Jingren Zhou, Bolin Ding

However, our further studies uncover that the design of the loss function of Flooding can lead to a discrepancy between its objective and implementation, and cause the instability issue.

Image Classification

Neural Methods for Logical Reasoning over Knowledge Graphs

1 code implementation ICLR 2022 Alfonso Amayuelas, Shuai Zhang, Xi Susie Rao, Ce Zhang

We introduce a set of models that use Neural Networks to create one-point vector embeddings to answer the queries.

Knowledge Graphs

Towards Automatic Bias Detection in Knowledge Graphs

1 code implementation Findings (EMNLP) 2021 Daphna Keidar, Mian Zhong, Ce Zhang, Yash Raj Shrestha, Bibek Paudel

With the recent surge in social applications relying on knowledge graphs, the need for techniques to ensure fairness in KG based methods is becoming increasingly evident.

Bias Detection Fairness +2

UNetFormer: An UNet-like Transformer for Efficient Semantic Segmentation of Remote Sensing Urban Scene Imagery

1 code implementation18 Sep 2021 Libo Wang, Rui Li, Ce Zhang, Shenghui Fang, Chenxi Duan, Xiaoliang Meng, Peter M. Atkinson

Semantic segmentation of remotely sensed urban scene images is required in a wide range of practical applications, such as land cover mapping, urban change detection, environmental protection, and economic assessment.

Change Detection Image Classification +2

Evaluating Bayes Error Estimators on Real-World Datasets with FeeBee

1 code implementation30 Aug 2021 Cedric Renggli, Luka Rimanic, Nora Hollenstein, Ce Zhang

The Bayes error rate (BER) is a fundamental concept in machine learning that quantifies the best possible accuracy any classifier can achieve on a fixed probability distribution.

LinkTeller: Recovering Private Edges from Graph Neural Networks via Influence Analysis

no code implementations14 Aug 2021 Fan Wu, Yunhui Long, Ce Zhang, Bo Li

We show that these DP GCN mechanisms are not always resilient against LinkTeller empirically under mild privacy guarantees ($\varepsilon>5$).

Recommendation Systems Traffic Prediction

Tackling the Overestimation of Forest Carbon with Deep Learning and Aerial Imagery

no code implementations23 Jul 2021 Gyri Reiersen, David Dao, Björn Lütjens, Konstantin Klemmer, Xiaoxiang Zhu, Ce Zhang

This proposal paper describes the first systematic comparison of forest carbon estimation from aerial imagery, satellite imagery, and ground-truth field measurements via deep learning-based algorithms for a tropical reforestation project.

VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space Decomposition

3 code implementations19 Jul 2021 Yang Li, Yu Shen, Wentao Zhang, Jiawei Jiang, Bolin Ding, Yaliang Li, Jingren Zhou, Zhi Yang, Wentao Wu, Ce Zhang, Bin Cui

End-to-end AutoML has attracted intensive interests from both academia and industry, which automatically searches for ML pipelines in a space induced by feature engineering, algorithm/model selection, and hyper-parameter tuning.

AutoML Feature Engineering +1

Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks

1 code implementation11 Jun 2021 Nezihe Merve Gürel, Xiangyu Qi, Luka Rimanic, Ce Zhang, Bo Li

In particular, we develop KEMLP by integrating a diverse set of weak auxiliary models based on their logical relationships to the main DNN model that performs the target task.

Knowledge Router: Learning Disentangled Representations for Knowledge Graphs

no code implementations NAACL 2021 Shuai Zhang, Xi Rao, Yi Tay, Ce Zhang

To this end, this paper proposes to learn disentangled representations of KG entities - a new method that disentangles the inner latent properties of KG entities.

Knowledge Graphs Representation Learning

OpenBox: A Generalized Black-box Optimization Service

7 code implementations1 Jun 2021 Yang Li, Yu Shen, Wentao Zhang, Yuanwei Chen, Huaijun Jiang, Mingchao Liu, Jiawei Jiang, Jinyang Gao, Wentao Wu, Zhi Yang, Ce Zhang, Bin Cui

Black-box optimization (BBO) has a broad range of applications, including automatic machine learning, engineering, physics, and experimental design.

Experimental Design Transfer Learning

TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness

no code implementations NeurIPS 2021 Zhuolin Yang, Linyi Li, Xiaojun Xu, Shiliang Zuo, Qian Chen, Pan Zhou, Benjamin I. P. Rubinstein, Ce Zhang, Bo Li

To answer these questions, in this work we first theoretically analyze and outline sufficient conditions for adversarial transferability between models; then propose a practical algorithm to reduce the transferability between base models within an ensemble to improve its robustness.

Towards Demystifying Serverless Machine Learning Training

1 code implementation17 May 2021 Jiawei Jiang, Shaoduo Gan, Yue Liu, Fanlin Wang, Gustavo Alonso, Ana Klimovic, Ankit Singla, Wentao Wu, Ce Zhang

The appeal of serverless (FaaS) has triggered a growing interest on how to use it in data-intensive applications such as ETL, query processing, or machine learning (ML).

A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images

1 code implementation25 Apr 2021 Libo Wang, Rui Li, Chenxi Duan, Ce Zhang, Xiaoliang Meng, Shenghui Fang

The fully convolutional network (FCN) with an encoder-decoder architecture has been the standard paradigm for semantic segmentation.

Semantic Segmentation

Distributed Learning Systems with First-order Methods

1 code implementation12 Apr 2021 Ji Liu, Ce Zhang

Scalable and efficient distributed learning is one of the main driving forces behind the recent rapid advancement of machine learning and artificial intelligence.

Quantization

TRS: Transferability Reduced Ensemble via Encouraging Gradient Diversity and Model Smoothness

1 code implementation NeurIPS 2021 Zhuolin Yang, Linyi Li, Xiaojun Xu, Shiliang Zuo, Qian Chen, Benjamin Rubinstein, Pan Zhou, Ce Zhang, Bo Li

To answer these questions, in this work we first theoretically analyze and outline sufficient conditions for adversarial transferability between models; then propose a practical algorithm to reduce the transferability between base models within an ensemble to improve its robustness.

DataLens: Scalable Privacy Preserving Training via Gradient Compression and Aggregation

1 code implementation20 Mar 2021 Boxin Wang, Fan Wu, Yunhui Long, Luka Rimanic, Ce Zhang, Bo Li

In this paper, we aim to explore the power of generative models and gradient sparsity, and propose a scalable privacy-preserving generative model DATALENS.

Dimensionality Reduction

Scale-aware Neural Network for Semantic Segmentation of Multi-resolution Remote Sensing Images

no code implementations14 Mar 2021 Libo Wang, Ce Zhang, Rui Li, Chenxi Duan, Xiaoliang Meng, Peter M. Atkinson

However, MSR images suffer from two critical issues: 1) increased scale variation of geo-objects and 2) loss of detailed information at coarse spatial resolutions.

Scene Understanding Semantic Segmentation

Evolving Attention with Residual Convolutions

no code implementations20 Feb 2021 Yujing Wang, Yaming Yang, Jiangang Bai, Mingliang Zhang, Jing Bai, Jing Yu, Ce Zhang, Gao Huang, Yunhai Tong

In this paper, we propose a novel and generic mechanism based on evolving attention to improve the performance of transformers.

Image Classification Machine Translation +2

Switch Spaces: Learning Product Spaces with Sparse Gating

no code implementations17 Feb 2021 Shuai Zhang, Yi Tay, Wenqi Jiang, Da-Cheng Juan, Ce Zhang

In order for learned representations to be effective and efficient, it is ideal that the geometric inductive bias aligns well with the underlying structure of the data.

Knowledge Graph Completion Representation Learning

Decoding EEG Brain Activity for Multi-Modal Natural Language Processing

no code implementations17 Feb 2021 Nora Hollenstein, Cedric Renggli, Benjamin Glaus, Maria Barrett, Marius Troendle, Nicolas Langer, Ce Zhang

In this paper, we present the first large-scale study of systematically analyzing the potential of EEG brain activity data for improving natural language processing tasks, with a special focus on which features of the signal are most beneficial.

EEG Sentiment Analysis

A2-FPN for Semantic Segmentation of Fine-Resolution Remotely Sensed Images

2 code implementations16 Feb 2021 Rui Li, Shunyi Zheng, Ce Zhang, Chenxi Duan, Libo Wang

Based on FPN and AAM, a novel framework named Attention Aggregation Feature Pyramid Network (A2-FPN) is developed for semantic segmentation of fine-resolution remotely sensed images.

Decision Making Scene Understanding +1

A Data Quality-Driven View of MLOps

no code implementations15 Feb 2021 Cedric Renggli, Luka Rimanic, Nezihe Merve Gürel, Bojan Karlaš, Wentao Wu, Ce Zhang

Developing machine learning models can be seen as a process similar to the one established for traditional software development.

1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed

2 code implementations4 Feb 2021 Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He

One of the most effective methods is error-compensated compression, which offers robust convergence speed even under 1-bit compression.

EEG-Inception: An Accurate and Robust End-to-End Neural Network for EEG-based Motor Imagery Classification

no code implementations24 Jan 2021 Ce Zhang, Young-Keun Kim, Azim Eskandarian

The proposed CNN model, namely EEG-Inception, is built on the backbone of the Inception-Time network, which showed to be highly efficient and accurate for time-series classification.

Classification Data Augmentation +4

Predictive Attention Transformer: Improving Transformer with Attention Map Prediction

no code implementations1 Jan 2021 Yujing Wang, Yaming Yang, Jiangang Bai, Mingliang Zhang, Jing Bai, Jing Yu, Ce Zhang, Yunhai Tong

Instead, we model their dependencies via a chain of prediction models that take previous attention maps as input to predict the attention maps of a new layer through convolutional neural networks.

14 Machine Translation

Suspicious Massive Registration Detection via Dynamic Heterogeneous Graph Neural Networks

no code implementations20 Dec 2020 Susie Xi Rao, Shuai Zhang, Zhichao Han, Zitao Zhang, Wei Min, Mo Cheng, Yinan Shan, Yang Zhao, Ce Zhang

Massive account registration has raised concerns on risk management in e-commerce companies, especially when registration increases rapidly within a short time frame.

Frame

Efficient Automatic CASH via Rising Bandits

no code implementations8 Dec 2020 Yang Li, Jiawei Jiang, Jinyang Gao, Yingxia Shao, Ce Zhang, Bin Cui

In this framework, the BO methods are used to solve the HPO problem for each ML algorithm separately, incorporating a much smaller hyperparameter space for BO methods.

Hyperparameter Optimization Multi-Armed Bandits

MFES-HB: Efficient Hyperband with Multi-Fidelity Quality Measurements

6 code implementations5 Dec 2020 Yang Li, Yu Shen, Jiawei Jiang, Jinyang Gao, Ce Zhang, Bin Cui

Instead of sampling configurations randomly in HB, BOHB samples configurations based on a BO surrogate model, which is constructed with the high-fidelity measurements only.

Hyperparameter Optimization

Learning to Mutate with Hypergradient Guided Population

no code implementations NeurIPS 2020 Zhiqiang Tao, Yaliang Li, Bolin Ding, Ce Zhang, Jingren Zhou, Yun Fu

Computing the gradient of model hyperparameters, i. e., hypergradient, enables a promising and natural way to solve the hyperparameter optimization task.

Hyperparameter Optimization

Multi-stage Attention ResU-Net for Semantic Segmentation of Fine-Resolution Remote Sensing Images

1 code implementation29 Nov 2020 Rui Li, Shunyi Zheng, Chenxi Duan, Jianlin Su, Ce Zhang

The attention mechanism can refine the extracted feature maps and boost the classification performance of the deep network, which has become an essential technique in computer vision and natural language processing.

Semantic Segmentation

xFraud: Explainable Fraud Transaction Detection

1 code implementation24 Nov 2020 Susie Xi Rao, Shuai Zhang, Zhichao Han, Zitao Zhang, Wei Min, Zhiyao Chen, Yinan Shan, Yang Zhao, Ce Zhang

At online retail platforms, it is crucial to actively detect the risks of transactions to improve customer experience and minimize financial loss.

Explainable Models Fraud Detection +1

Robust Unsupervised Small Area Change Detection from SAR Imagery Using Deep Learning

1 code implementation22 Nov 2020 Xinzheng Zhang, Hang Su, Ce Zhang, Xiaowei Gu, Xiaoheng Tan, Peter M. Atkinson

In this paper, a robust unsupervised approach is proposed for small area change detection from multi-temporal SAR images using deep learning.

Change Detection

Learning User Representations with Hypercuboids for Recommender Systems

3 code implementations11 Nov 2020 Shuai Zhang, Huoyu Liu, Aston Zhang, Yue Hu, Ce Zhang, Yumeng Li, Tanchao Zhu, Shaojian He, Wenwu Ou

Furthermore, we present two variants of hypercuboids to enhance the capability in capturing the diversities of user interests.

Collaborative Filtering Recommendation Systems

Online Active Model Selection for Pre-trained Classifiers

1 code implementation19 Oct 2020 Mohammad Reza Karimi, Nezihe Merve Gürel, Bojan Karlaš, Johannes Rausch, Ce Zhang, Andreas Krause

Given $k$ pre-trained classifiers and a stream of unlabeled data examples, how can we actively decide when to query a label so that we can distinguish the best model from the rest while making a small number of queries?

Model Selection

Ease.ML/Snoopy: Towards Automatic Feasibility Studies for ML via Quantitative Understanding of "Data Quality for ML"

1 code implementation16 Oct 2020 Cedric Renggli, Luka Rimanic, Luka Kolar, Wentao Wu, Ce Zhang

In our experience of working with domain experts who are using today's AutoML systems, a common problem we encountered is what we call "unrealistic expectations" -- when users are facing a very challenging task with a noisy data acquisition process, while being expected to achieve startlingly high accuracy with machine learning (ML).

AutoML

On Convergence of Nearest Neighbor Classifiers over Feature Transformations

no code implementations NeurIPS 2020 Luka Rimanic, Cedric Renggli, Bo Li, Ce Zhang

This analysis requires in-depth understanding of the properties that connect both the transformed space and the raw feature space.

Which Model to Transfer? Finding the Needle in the Growing Haystack

no code implementations13 Oct 2020 Cedric Renggli, André Susano Pinto, Luka Rimanic, Joan Puigcerver, Carlos Riquelme, Ce Zhang, Mario Lucic

Transfer learning has been recently popularized as a data-efficient alternative to training models from scratch, in particular for computer vision tasks where it provides a remarkably solid baseline.

Transfer Learning

MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions

no code implementations12 Oct 2020 Wenqi Jiang, Zhenhao He, Shuai Zhang, Thomas B. Preußer, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li, Jingren Zhou, Ce Zhang, Gustavo Alonso

MicroRec accelerates recommendation inference by (1) redesigning the data structures involved in the embeddings to reduce the number of lookups needed and (2) taking advantage of the availability of High-Bandwidth Memory (HBM) in FPGA accelerators to tackle the latency by enabling parallel lookups.

Recommendation Systems

Optimal Provable Robustness of Quantum Classification via Quantum Hypothesis Testing

no code implementations21 Sep 2020 Maurice Weber, Nana Liu, Bo Li, Ce Zhang, Zhikuan Zhao

This link leads to a tight robustness condition which puts constraints on the amount of noise a classifier can tolerate, independent of whether the noise source is natural or adversarial.

Classification General Classification +1

A Principled Approach to Data Valuation for Federated Learning

no code implementations14 Sep 2020 Tianhao Wang, Johannes Rausch, Ce Zhang, Ruoxi Jia, Dawn Song

The federated SV preserves the desirable properties of the canonical SV while it can be calculated without incurring extra communication cost and is also able to capture the effect of participation order on data value.

Data Summarization Federated Learning

Multi-Attention-Network for Semantic Segmentation of Fine Resolution Remote Sensing Images

no code implementations3 Sep 2020 Rui Li, Shunyi Zheng, Chenxi Duan, Ce Zhang, Jianlin Su, P. M. Atkinson

A novel attention mechanism of kernel attention with linear complexity is proposed to alleviate the large computational demand in attention.

Semantic Segmentation

APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm

no code implementations26 Aug 2020 Hanlin Tang, Shaoduo Gan, Samyam Rajbhandari, Xiangru Lian, Ji Liu, Yuxiong He, Ce Zhang

Adam is the important optimization algorithm to guarantee efficiency and accuracy for training many important tasks such as BERT and ImageNet.

A Survey and Tutorial of EEG-Based Brain Monitoring for Driver State Analysis

no code implementations25 Aug 2020 Ce Zhang, Azim Eskandarian

Then, the EEG signal preprocessing, feature extraction, and classification algorithms for driver state detection are reviewed.

Autonomous Vehicles EEG

A Computationally Efficient Multiclass Time-Frequency Common Spatial Pattern Analysis on EEG Motor Imagery

no code implementations25 Aug 2020 Ce Zhang, Azim Eskandarian

The experiment results show the proposed algorithm average computation time is 37. 22% less than the FBCSP (1st winner in the BCI Competition IV) and 4. 98% longer than the conventional CSP method.

Classification EEG +1

Land Cover Classification from Remote Sensing Images Based on Multi-Scale Fully Convolutional Network

1 code implementation1 Aug 2020 Rui Li, Shunyi Zheng, Chenxi Duan, Ce Zhang

In this paper, a Multi-Scale Fully Convolutional Network (MSFCN) with multi-scale convolutional kernel is proposed to exploit discriminative representations from two-dimensional (2D) satellite images.

General Classification

FIVES: Feature Interaction Via Edge Search for Large-Scale Tabular Data

no code implementations29 Jul 2020 Yuexiang Xie, Zhen Wang, Yaliang Li, Bolin Ding, Nezihe Merve Gürel, Ce Zhang, Minlie Huang, Wei. Lin, Jingren Zhou

Then we instantiate this search strategy by optimizing both a dedicated graph neural network (GNN) and the adjacency tensor associated with the defined feature graph.

Recommendation Systems

MACU-Net for Semantic Segmentation of Fine-Resolution Remotely Sensed Images

2 code implementations26 Jul 2020 Rui Li, Chenxi Duan, Shunyi Zheng, Ce Zhang, Peter M. Atkinson

In this Letter, we incorporate multi-scale features generated by different layers of U-Net and design a multi-scale skip connected and asymmetric-convolution-based U-Net (MACU-Net), for segmentation using fine-resolution remotely sensed images.

Medical Image Segmentation Semantic Segmentation

Adversarial Learning for Debiasing Knowledge Graph Embeddings

no code implementations29 Jun 2020 Mario Arduini, Lorenzo Noci, Federico Pirovano, Ce Zhang, Yash Raj Shrestha, Bibek Paudel

As a second step, we explore gender bias in KGE, and a careful examination of popular KGE algorithms suggest that sensitive attribute like the gender of a person can be predicted from the embedding.

Knowledge Graph Embeddings Knowledge Graphs +1

Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions

1 code implementation11 May 2020 Bojan Karlaš, Peng Li, Renzhi Wu, Nezihe Merve Gürel, Xu Chu, Wentao Wu, Ce Zhang

Machine learning (ML) applications have been thriving recently, largely attributed to the increasing availability of data.

TrueBranch: Metric Learning-based Verification of Forest Conservation Projects

no code implementations21 Apr 2020 Simona Santamaria, David Dao, Björn Lütjens, Ce Zhang

Recent works propose low-cost and accurate MRV via automatically determining forest carbon from drone imagery, collected by the landowners.

Metric Learning

RAB: Provable Robustness Against Backdoor Attacks

1 code implementation19 Mar 2020 Maurice Weber, Xiaojun Xu, Bojan Karlaš, Ce Zhang, Bo Li

In addition, we theoretically show that it is possible to train the robust smoothed models efficiently for simple models such as K-nearest neighbor classifiers, and we propose an exact smooth-training algorithm which eliminates the need to sample from a noise distribution for such models.

A Robust Imbalanced SAR Image Change Detection Approach Based on Deep Difference Image and PCANet

no code implementations3 Mar 2020 Xinzheng Zhang, Hang Su, Ce Zhang, Peter M. Atkinson, Xiaoheng Tan, Xiaoping Zeng, Xin Jian

Parallel FCM are utilized on these two mapped DDIs to obtain three types of pseudo-label pixels, namely, changed pixels, unchanged pixels, and intermediate pixels.

Change Detection

End-to-end Robustness for Sensing-Reasoning Machine Learning Pipelines

no code implementations28 Feb 2020 Zhuolin Yang, Zhikuan Zhao, Hengzhi Pei, Boxin Wang, Bojan Karlas, Ji Liu, Heng Guo, Bo Li, Ce Zhang

We show that for reasoning components such as MLN and a specific family of Bayesian networks it is possible to certify the robustness of the whole pipeline even with a large magnitude of perturbation which cannot be certified by existing work.

TSS: Transformation-Specific Smoothing for Robustness Certification

1 code implementation27 Feb 2020 Linyi Li, Maurice Weber, Xiaojun Xu, Luka Rimanic, Bhavya Kailkhura, Tao Xie, Ce Zhang, Bo Li

Moreover, to the best of our knowledge, TSS is the first approach that achieves nontrivial certified robustness on the large-scale ImageNet dataset.

Data Science through the looking glass and what we found there

no code implementations19 Dec 2019 Fotis Psallidas, Yiwen Zhu, Bojan Karlas, Matteo Interlandi, Avrilia Floratou, Konstantinos Karanasos, Wentao Wu, Ce Zhang, Subru Krishnan, Carlo Curino, Markus Weimer

The recent success of machine learning (ML) has led to an explosive growth both in terms of new systems and algorithms built in industry and academia, and new applications built by an ever-growing community of data science (DS) practitioners.

Support Vector Machine Classifier via $L_{0/1}$ Soft-Margin Loss

no code implementations16 Dec 2019 Huajun Wang, Yuan-Hai Shao, Shenglong Zhou, Ce Zhang, Naihua Xiu

To distinguish all of them, in this paper, we introduce a new model equipped with an $L_{0/1}$ soft-margin loss (dubbed as $L_{0/1}$-SVM) which well captures the nature of the binary classification.

ZuCo 2.0: A Dataset of Physiological Recordings During Natural Reading and Annotation

no code implementations LREC 2020 Nora Hollenstein, Marius Troendle, Ce Zhang, Nicolas Langer

We recorded and preprocessed ZuCo 2. 0, a new dataset of simultaneous eye-tracking and electroencephalography during natural reading and during annotation.

Scalability vs. Utility: Do We Have to Sacrifice One for the Other in Data Importance Quantification?

1 code implementation CVPR 2021 Ruoxi Jia, Fan Wu, Xuehui Sun, Jiacen Xu, David Dao, Bhavya Kailkhura, Ce Zhang, Bo Li, Dawn Song

Quantifying the importance of each training point to a learning task is a fundamental problem in machine learning and the estimated importance scores have been leveraged to guide a range of data workflows such as data summarization and domain adaption.

Data Summarization Domain Adaptation

DocParser: Hierarchical Structure Parsing of Document Renderings

2 code implementations5 Nov 2019 Johannes Rausch, Octavio Martinez, Fabian Bissig, Ce Zhang, Stefan Feuerriegel

Translating renderings (e. g. PDFs, scans) into hierarchical document structures is extensively demanded in the daily routines of many real-world applications.

DeGNN: Characterizing and Improving Graph Neural Networks with Graph Decomposition

no code implementations10 Oct 2019 Xupeng Miao, Nezihe Merve Gürel, Wentao Zhang, Zhichao Han, Bo Li, Wei Min, Xi Rao, Hansheng Ren, Yinan Shan, Yingxia Shao, Yujie Wang, Fan Wu, Hui Xue, Yaming Yang, Zitao Zhang, Yang Zhao, Shuai Zhang, Yujing Wang, Bin Cui, Ce Zhang

Despite the wide application of Graph Convolutional Network (GCN), one major limitation is that it does not benefit from the increasing depth and suffers from the oversmoothing problem.

Observer Dependent Lossy Image Compression

1 code implementation8 Oct 2019 Maurice Weber, Cedric Renggli, Helmut Grabner, Ce Zhang

To that end, we use a family of loss functions that allows to optimize deep image compression depending on the observer and to interpolate between human perceived visual quality and classification accuracy, enabling a more unified view on image compression.

Classification General Classification +4

An Empirical and Comparative Analysis of Data Valuation with Scalable Algorithms

no code implementations25 Sep 2019 Ruoxi Jia, Xuehui Sun, Jiacen Xu, Ce Zhang, Bo Li, Dawn Song

Existing approximation algorithms, although achieving great improvement over the exact algorithm, relies on retraining models for multiple times, thus remaining limited when applied to larger-scale learning tasks and real-world datasets.

Data Summarization Domain Adaptation

CogniVal: A Framework for Cognitive Word Embedding Evaluation

1 code implementation CONLL 2019 Nora Hollenstein, Antonio de la Torre, Nicolas Langer, Ce Zhang

An interesting method of evaluating word representations is by how much they reflect the semantic representations in the human brain.

EEG Word Embeddings

Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms

1 code implementation22 Aug 2019 Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nezihe Merve Gurel, Bo Li, Ce Zhang, Costas J. Spanos, Dawn Song

The most surprising result is that for unweighted $K$NN classifiers and regressors, the Shapley value of all $N$ data points can be computed, exactly, in $O(N\log N)$ time -- an exponential improvement on computational complexity!

Fairness

$\texttt{DeepSqueeze}$: Decentralization Meets Error-Compensated Compression

no code implementations17 Jul 2019 Hanlin Tang, Xiangru Lian, Shuang Qiu, Lei Yuan, Ce Zhang, Tong Zhang, Ji Liu

Since the \emph{decentralized} training has been witnessed to be superior to the traditional \emph{centralized} training in the communication restricted scenario, therefore a natural question to ask is "how to apply the error-compensated technology to the decentralized learning to further reduce the communication cost."

CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks

no code implementations20 Apr 2019 Peng Li, Xi Rao, Jennifer Blase, Yue Zhang, Xu Chu, Ce Zhang

Data quality affects machine learning (ML) model performances, and data scientists spend considerable amount of time on data cleaning before model training.

14 General Classification +1

Advancing NLP with Cognitive Language Processing Signals

3 code implementations4 Apr 2019 Nora Hollenstein, Maria Barrett, Marius Troendle, Francesco Bigiolli, Nicolas Langer, Ce Zhang

Cognitive language processing data such as eye-tracking features have shown improvements on single NLP tasks.

EEG General Classification +3

Sensing Social Media Signals for Cryptocurrency News

no code implementations27 Mar 2019 Johannes Beck, Roberta Huang, David Lindner, Tian Guo, Ce Zhang, Dirk Helbing, Nino Antulov-Fantulin

The ability to track and monitor relevant and important news in real-time is of crucial interest in multiple industrial sectors.

Continuous Integration of Machine Learning Models with ease.ml/ci: Towards a Rigorous Yet Practical Treatment

no code implementations1 Mar 2019 Cedric Renggli, Bojan Karlaš, Bolin Ding, Feng Liu, Kevin Schawinski, Wentao Wu, Ce Zhang

Continuous integration is an indispensable step of modern software engineering practices to systematically manage the life cycles of system development.

Towards Efficient Data Valuation Based on the Shapley Value

no code implementations27 Feb 2019 Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gurel, Bo Li, Ce Zhang, Dawn Song, Costas Spanos

In this paper, we study the problem of data valuation by utilizing the Shapley value, a popular notion of value which originated in coopoerative game theory.

Entity Recognition at First Sight: Improving NER with Eye Movement Information

1 code implementation NAACL 2019 Nora Hollenstein, Ce Zhang

Previous research shows that eye-tracking data contains information about the lexical and syntactic properties of text, which can be used to improve natural language processing models.

Named Entity Recognition NER

Exploring galaxy evolution with generative models

no code implementations3 Dec 2018 Kevin Schawinski, M. Dennis Turp, Ce Zhang

Methods: By learning a latent space representation of the data, we can use this network to forward model and explore hypotheses in a data-driven way.

Distributed Learning over Unreliable Networks

no code implementations17 Oct 2018 Chen Yu, Hanlin Tang, Cedric Renggli, Simon Kassing, Ankit Singla, Dan Alistarh, Ce Zhang, Ji Liu

Most of today's distributed machine learning systems assume {\em reliable networks}: whenever two machines exchange information (e. g., gradients or models), the network should guarantee the delivery of the message.

$D^2$: Decentralized Training over Decentralized Data

no code implementations ICML 2018 Hanlin Tang, Xiangru Lian, Ming Yan, Ce Zhang, Ji Liu

While training a machine learning model using multiple workers, each of which collects data from its own data source, it would be useful when the data collected from different workers are unique and different.

Image Classification Multi-view Subspace Clustering

Using transfer learning to detect galaxy mergers

no code implementations25 May 2018 Sandro Ackermann, Kevin Schawinski, Ce Zhang, Anna K. Weigel, M. Dennis Turp

We investigate the use of deep convolutional neural networks (deep CNNs) for automatic visual detection of galaxy mergers.

General Classification Transfer Learning

ETH-DS3Lab at SemEval-2018 Task 7: Effectively Combining Recurrent and Convolutional Neural Networks for Relation Classification and Extraction

no code implementations SEMEVAL 2018 Jonathan Rotsztejn, Nora Hollenstein, Ce Zhang

Reliably detecting relevant relations between entities in unstructured text is a valuable resource for knowledge extraction, which is why it has awaken significant interest in the field of Natural Language Processing.

General Classification Relation Classification

PSFGAN: a generative adversarial network system for separating quasar point sources and host galaxy light

no code implementations23 Mar 2018 Dominic Stark, Barthelemy Launet, Kevin Schawinski, Ce Zhang, Michael Koss, M. Dennis Turp, Lia F. Sartori, Hantian Zhang, Yiru Chen, Anna K. Weigel

We test the method using Sloan Digital Sky Survey (SDSS) r-band images with artificial AGN point sources added which are then removed using the GAN and with parametric methods using GALFIT.

Astrophysics of Galaxies Data Analysis, Statistics and Probability

D$^2$: Decentralized Training over Decentralized Data

no code implementations19 Mar 2018 Hanlin Tang, Xiangru Lian, Ming Yan, Ce Zhang, Ji Liu

While training a machine learning model using multiple workers, each of which collects data from their own data sources, it would be most useful when the data collected from different workers can be {\em unique} and {\em different}.

Image Classification

AutoML from Service Provider's Perspective: Multi-device, Multi-tenant Model Selection with GP-EI

no code implementations17 Mar 2018 Chen Yu, Bojan Karlas, Jie Zhong, Ce Zhang, Ji Liu

In this paper, we focus on the AutoML problem from the \emph{service provider's perspective}, motivated by the following practical consideration: When an AutoML service needs to serve {\em multiple users} with {\em multiple devices} at the same time, how can we allocate these devices to users in an efficient way?

AutoML Model Selection

Communication Compression for Decentralized Training

no code implementations NeurIPS 2018 Hanlin Tang, Shaoduo Gan, Ce Zhang, Tong Zhang, Ji Liu

In this paper, We explore a natural question: {\em can the combination of both techniques lead to a system that is robust to both bandwidth and latency?}

DataBright: Towards a Global Exchange for Decentralized Data Ownership and Trusted Computation

1 code implementation13 Feb 2018 David Dao, Dan Alistarh, Claudiu Musat, Ce Zhang

We illustrate that trusted computation can enable the creation of an AI market, where each data point has an exact value that should be paid to its creator.

Asynchronous Decentralized Parallel Stochastic Gradient Descent

1 code implementation ICML 2018 Xiangru Lian, Wei zhang, Ce Zhang, Ji Liu

Can we design an algorithm that is robust in a heterogeneous environment, while being communication efficient and maintaining the best-possible convergence rate?

Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads

no code implementations24 Aug 2017 Tian Li, Jie Zhong, Ji Liu, Wentao Wu, Ce Zhang

We ask, as a "service provider" that manages a shared cluster of machines among all our users running machine learning workloads, what is the resource allocation strategy that maximizes the global satisfaction of all our users?

Fairness Image Classification +2

ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning

no code implementations ICML 2017 Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, Ce Zhang

We examine training at reduced precision, both from a theoretical and practical perspective, and ask: is it possible to train models at end-to-end low precision with provable guarantees?

Quantization

MLBench: How Good Are Machine Learning Clouds for Binary Classification Tasks on Structured Data?

no code implementations29 Jul 2017 Yu Liu, Hantian Zhang, Luyuan Zeng, Wentao Wu, Ce Zhang

We then compare the performance of the top winning code available from Kaggle with that of running machine learning clouds from both Azure and Amazon on mlbench.

General Classification

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

1 code implementation NeurIPS 2017 Xiangru Lian, Ce Zhang, huan zhang, Cho-Jui Hsieh, Wei zhang, Ji Liu

On network configurations with low bandwidth or high latency, D-PSGD can be up to one order of magnitude faster than its well-optimized centralized counterparts.

Layerwise Systematic Scan: Deep Boltzmann Machines and Beyond

no code implementations15 May 2017 Heng Guo, Kaan Kara, Ce Zhang

For Markov chain Monte Carlo methods, one of the greatest discrepancies between theory and system is the scan order - while most theoretical development on the mixing time analysis deals with random updates, real-world systems are implemented with systematic scans.

Generative Adversarial Networks recover features in astrophysical images of galaxies beyond the deconvolution limit

no code implementations1 Feb 2017 Kevin Schawinski, Ce Zhang, Hantian Zhang, Lucas Fowler, Gokula Krishnan Santhanam

Observations of astrophysical objects such as galaxies are limited by various sources of random and systematic noise from the sky background, the optical system of the telescope and the detector used to record the data.

The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning

1 code implementation16 Nov 2016 Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, Ce Zhang

When applied to linear models together with double sampling, we save up to another 1. 7x in data movement compared with uniform quantization.

Quantization

Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs

1 code implementation14 Jun 2016 Stefan Hadjis, Ce Zhang, Ioannis Mitliagkas, Dan Iter, Christopher Ré

Given a specification of a convolutional neural network, our goal is to minimize the time to train this model on a cluster of commodity CPUs and GPUs.

Asynchrony begets Momentum, with an Application to Deep Learning

3 code implementations31 May 2016 Ioannis Mitliagkas, Ce Zhang, Stefan Hadjis, Christopher Ré

Since asynchronous methods have better hardware efficiency, this result may shed light on when asynchronous execution is more efficient for deep learning systems.

Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width

no code implementations NeurIPS 2015 Christopher De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré

Gibbs sampling on factor graphs is a widely used inference technique, which often produces good empirical results.

Building a Large-scale Multimodal Knowledge Base System for Answering Visual Queries

no code implementations20 Jul 2015 Yuke Zhu, Ce Zhang, Christopher Ré, Li Fei-Fei

The complexity of the visual world creates significant challenges for comprehensive visual understanding.

Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms

no code implementations22 Jun 2015 Christopher De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré

with relaxed assumptions on the sparsity of the problem; (2) we analyze asynchronous SGD algorithms for non-convex matrix problems including matrix completion; and (3) we design and analyze an asynchronous SGD algorithm, called Buckwild!, that uses lower-precision arithmetic.

Matrix Completion

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

1 code implementation16 Apr 2015 Stefan Hadjis, Firas Abuzaid, Ce Zhang, Christopher Ré

We present Caffe con Troll (CcT), a fully compatible end-to-end version of the popular framework Caffe with rebuilt internals.

Incremental Knowledge Base Construction Using DeepDive

no code implementations3 Feb 2015 Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, Christopher Ré

Populating a database with unstructured information is a long-standing problem in industry and research that encompasses problems of extraction, cleaning, and integration.

Parallel Feature Selection Inspired by Group Testing

no code implementations NeurIPS 2014 Yingbo Zhou, Utkarsh Porwal, Ce Zhang, Hung Q. Ngo, XuanLong Nguyen, Christopher Ré, Venu Govindaraju

Superior performance of our method is demonstrated on a challenging relation extraction task from a very large data set that have both redundant features and sample size in the order of millions.

feature selection General Classification +1

Feature Engineering for Knowledge Base Construction

no code implementations24 Jul 2014 Christopher Ré, Amir Abbas Sadeghian, Zifei Shan, Jaeho Shin, Feiran Wang, Sen Wu, Ce Zhang

Our approach to KBC is based on joint probabilistic inference and learning, but we do not see inference as either a panacea or a magic bullet: inference is a tool that allows us to be systematic in how we construct, debug, and improve the quality of such systems.

Feature Engineering

A machine-compiled macroevolutionary history of Phanerozoic life

no code implementations11 Jun 2014 Shanan E. Peters, Ce Zhang, Miron Livny, Christopher Ré

Many aspects of macroevolutionary theory and our understanding of biotic responses to global environmental change derive from literature-based compilations of palaeontological data.

Reading Comprehension

DimmWitted: A Study of Main-Memory Statistical Analytics

1 code implementation28 Mar 2014 Ce Zhang, Christopher Ré

We perform the first study of the tradeoff space of access methods and replication to support statistical analytics using first-order methods executed in the main memory of a Non-Uniform Memory Access (NUMA) machine.

An Approximate, Efficient LP Solver for LP Rounding

no code implementations NeurIPS 2013 Srikrishna Sridhar, Stephen Wright, Christopher Re, Ji Liu, Victor Bittorf, Ce Zhang

Many problems in machine learning can be solved by rounding the solution of an appropriate linear program.

Cannot find the paper you are looking for? You can Submit a new open access paper.