Search Results for author: Da-Cheng Juan

Found 36 papers, 15 papers with code

Uncertainty Modeling of Emerging Device-based Computing-in-Memory Neural Accelerators with Application to Neural Architecture Search

no code implementations6 Jul 2021 Zheyu Yan, Da-Cheng Juan, Xiaobo Sharon Hu, Yiyu Shi

Emerging device-based Computing-in-memory (CiM) has been proved to be a promising candidate for high-energy efficiency deep neural network (DNN) computations.

Neural Architecture Search

CARLS: Cross-platform Asynchronous Representation Learning System

1 code implementation26 May 2021 Chun-Ta Lu, Yun Zeng, Da-Cheng Juan, Yicheng Fan, Zhe Li, Jan Dlabal, Yi-Ting Chen, Arjun Gopalan, Allan Heydon, Chun-Sung Ferng, Reah Miyara, Ariel Fuxman, Futang Peng, Zhen Li, Tom Duerig, Andrew Tomkins

In this work, we propose CARLS, a novel framework for augmenting the capacity of existing deep learning frameworks by enabling multiple components -- model trainers, knowledge makers and knowledge banks -- to concertedly work together in an asynchronous fashion across hardware platforms.

Curriculum Learning Representation Learning

OmniNet: Omnidirectional Representations from Transformers

1 code implementation1 Mar 2021 Yi Tay, Mostafa Dehghani, Vamsi Aribandi, Jai Gupta, Philip Pham, Zhen Qin, Dara Bahri, Da-Cheng Juan, Donald Metzler

In OmniNet, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in the entire network.

Few-Shot Learning Fine-tuning +3

Switch Spaces: Learning Product Spaces with Sparse Gating

no code implementations17 Feb 2021 Shuai Zhang, Yi Tay, Wenqi Jiang, Da-Cheng Juan, Ce Zhang

In order for learned representations to be effective and efficient, it is ideal that the geometric inductive bias aligns well with the underlying structure of the data.

Knowledge Graph Completion Representation Learning

HyperGrid Transformers: Towards A Single Model for Multiple Tasks

no code implementations ICLR 2021 Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

Specifically, we propose a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks.

Fine-tuning Language understanding +2

Synthesizer: Rethinking Self-Attention for Transformer Models

no code implementations1 Jan 2021 Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng

The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models.

Language Modelling Machine Translation +2

Context-Aware Temperature for Language Modeling

no code implementations1 Jan 2021 Pei-Hsin Wang, Sheng-Iou Hsieh, Shih-Chieh Chang, Yu-Ting Chen, Da-Cheng Juan, Jia-Yu Pan, Wei Wei

Current practices to apply temperature scaling assume either a fixed, or a manually-crafted dynamically changing schedule.

Language Modelling

Graph Autoencoders with Deconvolutional Networks

no code implementations22 Dec 2020 Jia Li, Tomas Yu, Da-Cheng Juan, Arjun Gopalan, Hong Cheng, Andrew Tomkins

Recent studies have indicated that Graph Convolutional Networks (GCNs) act as a \emph{low pass} filter in spectral domain and encode smoothed node representations.

Graph Generation

Adversarial Robustness Across Representation Spaces

no code implementations CVPR 2021 Pranjal Awasthi, George Yu, Chun-Sung Ferng, Andrew Tomkins, Da-Cheng Juan

In this work we extend the above setting to consider the problem of training of deep neural networks that can be made simultaneously robust to perturbations applied in multiple natural representation spaces.

Adversarial Robustness Image Classification

Mitigating Forgetting in Online Continual Learning via Instance-Aware Parameterization

no code implementations NeurIPS 2020 Hung-Jen Chen, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, Min Sun

To preserve the knowledge we learn from previous instances, we proposed a method to protect the path by restricting the gradient updates of one instance from overriding past updates calculated from previous instances if these instances are not similar.

Continual Learning Fine-tuning

AirConcierge: Generating Task-Oriented Dialogue via Efficient Large-Scale Knowledge Retrieval

1 code implementation Findings of the Association for Computational Linguistics 2020 Chieh-Yang Chen, Pei-Hsin Wang, Shih-Chieh Chang, Da-Cheng Juan, Wei Wei, Jia-Yu Pan

Despite recent success in neural task-oriented dialogue systems, developing such a real-world system involves accessing large-scale knowledge bases (KBs), which cannot be simply encoded by neural approaches, such as memory network mechanisms.

Task-Oriented Dialogue Systems Text-To-Sql

Question Answering with Long Multiple-Span Answers

1 code implementation Findings of the Association for Computational Linguistics 2020 Ming Zhu, Aman Ahuja, Da-Cheng Juan, Wei Wei, Chandan K. Reddy

To this end, we present MASH-QA, a Multiple Answer Spans Healthcare Question Answering dataset from the consumer health domain, where answers may need to be excerpted from multiple, non-consecutive parts of text spanned across a long document.

Question Answering

HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections

no code implementations12 Jul 2020 Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

The proposed approach is based on a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks.

Fine-tuning Language understanding +2

Remix: Rebalanced Mixup

no code implementations8 Jul 2020 Hsin-Ping Chou, Shih-Chieh Chang, Jia-Yu Pan, Wei Wei, Da-Cheng Juan

In this work, we propose a new regularization technique, Remix, that relaxes Mixup's formulation and enables the mixing factors of features and labels to be disentangled.

Robust Processing-In-Memory Neural Networks via Noise-Aware Normalization

no code implementations7 Jul 2020 Li-Huang Tsai, Shih-Chieh Chang, Yu-Ting Chen, Jia-Yu Pan, Wei Wei, Da-Cheng Juan

In this paper, we propose a noise-agnostic method to achieve robust neural network performance against any noise setting.

Object Detection Semantic Segmentation

Synthesizer: Rethinking Self-Attention in Transformer Models

1 code implementation2 May 2020 Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng

The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models.

 Ranked #1 on Dialogue Generation on Persona-Chat (BLEU-1 metric)

Abstractive Text Summarization Dialogue Generation +6

Sparse Sinkhorn Attention

1 code implementation ICML 2020 Yi Tay, Dara Bahri, Liu Yang, Donald Metzler, Da-Cheng Juan

We propose Sparse Sinkhorn Attention, a new efficient and sparse method for learning to attend.

Document Classification Image Generation +2

Learning with Hierarchical Complement Objective

no code implementations17 Nov 2019 Hao-Yun Chen, Li-Huang Tsai, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan

Label hierarchies widely exist in many vision-related problems, ranging from explicit label hierarchies existed in image classification to latent label hierarchies existed in semantic segmentation.

General Classification Image Classification +1

Natural Adversarial Sentence Generation with Gradient-based Perturbation

1 code implementation6 Sep 2019 Yu-Lun Hsieh, Minhao Cheng, Da-Cheng Juan, Wei Wei, Wen-Lian Hsu, Cho-Jui Hsieh

This work proposes a novel algorithm to generate natural language adversarial input for text classification models, in order to investigate the robustness of these models.

Sentence Embeddings Sentiment Analysis +1

A2N: Attending to Neighbors for Knowledge Graph Inference

no code implementations ACL 2019 Trapit Bansal, Da-Cheng Juan, Sujith Ravi, Andrew McCallum

State-of-the-art models for knowledge graph completion aim at learning a fixed embedding representation of entities in a multi-relational graph which can generalize to infer unseen entity relationships at test time.

Knowledge Graph Completion Link Prediction

COCO-GAN: Conditional Coordinate Generative Adversarial Network

no code implementations ICLR 2019 Chieh Hubert Lin, Chia-Che Chang, Yu-Sheng Chen, Da-Cheng Juan, Wei Wei, Hwann-Tzong Chen

The fact that the patch generation process is independent to each other inspires a wide range of new applications: firstly, "Patch-Inspired Image Generation" enables us to generate the entire image based on a single patch.

Image Generation Scene Generation

COCO-GAN: Generation by Parts via Conditional Coordinating

1 code implementation ICCV 2019 Chieh Hubert Lin, Chia-Che Chang, Yu-Sheng Chen, Da-Cheng Juan, Wei Wei, Hwann-Tzong Chen

On the computation side, COCO-GAN has a built-in divide-and-conquer paradigm that reduces memory requisition during training and inference, provides high-parallelism, and can generate parts of images on-demand.

Face Generation

Improving Adversarial Robustness via Guided Complement Entropy

2 code implementations ICCV 2019 Hao-Yun Chen, Jhao-Hong Liang, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan

Adversarial robustness has emerged as an important topic in deep learning as carefully crafted attack samples can significantly disturb the performance of a model.

Adversarial Defense Adversarial Robustness +1

Complement Objective Training

1 code implementation ICLR 2019 Hao-Yun Chen, Pei-Hsin Wang, Chun-Hao Liu, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan

Although being a widely-adopted approach, using cross entropy as the primary objective exploits mostly the information from the ground-truth class for maximizing data likelihood, and largely ignores information from the complement (incorrect) classes.

Language understanding Natural Language Understanding

Graph-RISE: Graph-Regularized Image Semantic Embedding

1 code implementation14 Feb 2019 Da-Cheng Juan, Chun-Ta Lu, Zhen Li, Futang Peng, Aleksei Timofeev, Yi-Ting Chen, Yaxi Gao, Tom Duerig, Andrew Tomkins, Sujith Ravi

Learning image representations to capture fine-grained semantics has been a challenging and important task enabling many applications such as image search and clustering.

General Classification Graph Learning +2

InstaNAS: Instance-aware Neural Architecture Search

2 code implementations26 Nov 2018 An-Chieh Cheng, Chieh Hubert Lin, Da-Cheng Juan, Wei Wei, Min Sun

Conventional Neural Architecture Search (NAS) aims at finding a single architecture that achieves the best performance, which usually optimizes task related learning objectives such as accuracy.

Neural Architecture Search

Searching Toward Pareto-Optimal Device-Aware Neural Architectures

no code implementations29 Aug 2018 An-Chieh Cheng, Jin-Dong Dong, Chi-Hung Hsu, Shu-Huan Chang, Min Sun, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan

Recent breakthroughs in Neural Architectural Search (NAS) have achieved state-of-the-art performance in many tasks such as image classification and language understanding.

Image Classification Language understanding

DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures

no code implementations ECCV 2018 Jin-Dong Dong, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, Min Sun

We propose DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures, optimizing for both device-related (e. g., inference time and memory usage) and device-agnostic (e. g., accuracy and model size) objectives.

Image Classification Language Modelling

HyperPower: Power- and Memory-Constrained Hyper-Parameter Optimization for Neural Networks

no code implementations6 Dec 2017 Dimitrios Stamoulis, Ermao Cai, Da-Cheng Juan, Diana Marculescu

While selecting the hyper-parameters of Neural Networks (NNs) has been so far treated as an art, the emergence of more complex, deeper architectures poses increasingly more challenges to designers and Machine Learning (ML) practitioners, especially when power and memory constraints need to be considered.

NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks

2 code implementations15 Oct 2017 Ermao Cai, Da-Cheng Juan, Dimitrios Stamoulis, Diana Marculescu

We also propose the "energy-precision ratio" (EPR) metric to guide machine learners in selecting an energy-efficient CNN architecture that better trades off the energy consumption and prediction accuracy.

Cannot find the paper you are looking for? You can Submit a new open access paper.