Search Results for author: Kenji Kawaguchi

Found 121 papers, 53 papers with code

When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training

1 code implementation20 Nov 2024 Haonan Wang, Qian Liu, Chao Du, Tongyao Zhu, Cunxiao Du, Kenji Kawaguchi, Tianyu Pang

To address this, we develop AnchorAttention, a plug-and-play attention method that alleviates numerical issues caused by BFloat16, improves long-context capabilities, and speeds up training.

Computational Efficiency Position

Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models

1 code implementation1 Nov 2024 Do Xuan Long, Duong Ngoc Yen, Anh Tuan Luu, Kenji Kawaguchi, Min-Yen Kan, Nancy F. Chen

We present Multi-expert Prompting, a novel enhancement of ExpertPrompting (Xu et al., 2023), designed to improve the large language model (LLM) generation.

Decision Making Informativeness +2

Investigating Layer Importance in Large Language Models

no code implementations22 Sep 2024 Yang Zhang, Yanfei Dong, Kenji Kawaguchi

In this study, we advance the understanding of LLM by investigating the significance of individual layers in LLMs.

Data Valuation

Single Character Perturbations Break LLM Alignment

no code implementations3 Jul 2024 Leon Lin, Hannah Brown, Kenji Kawaguchi, Michael Shieh

When LLMs are deployed in sensitive, human-facing settings, it is crucial that they do not output unsafe, biased, or privacy-violating outputs.

Self-Evaluation as a Defense Against Adversarial Attacks on LLMs

1 code implementation3 Jul 2024 Hannah Brown, Leon Lin, Kenji Kawaguchi, Michael Shieh

We introduce a defense against adversarial attacks on LLMs utilizing self-evaluation.

Score-fPINN: Fractional Score-Based Physics-Informed Neural Networks for High-Dimensional Fokker-Planck-Levy Equations

no code implementations17 Jun 2024 Zheyuan Hu, Zhongqiang Zhang, George Em Karniadakis, Kenji Kawaguchi

We introduce an innovative approach for solving high-dimensional Fokker-Planck-L\'evy (FPL) equations in modeling non-Brownian processes across disciplines such as physics, finance, and ecology.

Tackling the Curse of Dimensionality in Fractional and Tempered Fractional PDEs with Physics-Informed Neural Networks

no code implementations17 Jun 2024 Zheyuan Hu, Kenji Kawaguchi, Zhongqiang Zhang, George Em Karniadakis

We validate our methods on various forward and inverse problems of fractional and tempered fractional PDEs, scaling up to 100, 000 dimensions.

Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers

no code implementations5 Jun 2024 Brian K Chen, Tianyang Hu, Hui Jin, Hwee Kuan Lee, Kenji Kawaguchi

We further suggest how our method can be adapted to achieve cheap approximate conversion of ICL tokens, even in regular transformer networks that are not linearized.

In-Context Learning

Learning diverse attacks on large language models for robust red-teaming and safety tuning

no code implementations28 May 2024 Seanie Lee, Minsu Kim, Lynn Cherif, David Dobre, Juho Lee, Sung Ju Hwang, Kenji Kawaguchi, Gauthier Gidel, Yoshua Bengio, Nikolay Malkin, Moksh Jain

Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe and responsible deployment of large language models (LLMs).

Diversity Language Modelling

FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models

no code implementations28 May 2024 Yang Zhang, Yawei Li, Xinpeng Wang, Qianli Shen, Barbara Plank, Bernd Bischl, Mina Rezaei, Kenji Kawaguchi

Overparametrized transformer networks are the state-of-the-art architecture for Large Language Models (LLMs).

Decoder

ReactXT: Understanding Molecular "Reaction-ship" via Reaction-Contextualized Molecule-Text Pretraining

1 code implementation23 May 2024 Zhiyuan Liu, Yaorui Shi, An Zhang, Sihang Li, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua

To resolve the challenges above, we propose a new pretraining method, ReactXT, for reaction-text modeling, and a new dataset, OpenExp, for experimental procedure prediction.

Molecule Captioning Retrosynthesis

ProtT3: Protein-to-Text Generation for Text-based Protein Understanding

1 code implementation21 May 2024 Zhiyuan Liu, An Zhang, Hao Fei, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua

ProtT3 empowers an LM to understand protein sequences of amino acids by incorporating a PLM as its protein understanding module, enabling effective protein-to-text generation.

Property Prediction Question Answering +2

Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning

2 code implementations1 May 2024 Yuxi Xie, Anirudh Goyal, Wenyue Zheng, Min-Yen Kan, Timothy P. Lillicrap, Kenji Kawaguchi, Michael Shieh

We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process inspired by the successful strategy employed by AlphaZero.

ARC GSM8K +1

Deep Regression Representation Learning with Topology

1 code implementation22 Apr 2024 Shihao Zhang, Kenji Kawaguchi, Angela Yao

Based on these two connections, we introduce PH-Reg, a regularizer specific to regression that matches the intrinsic dimension and topology of the feature space with the target space.

regression Representation Learning

Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models

1 code implementation11 Mar 2024 Yang Zhang, Teoh Tze Tzun, Lim Wei Hern, Tiviatis Sim, Kenji Kawaguchi

Recent advancements in diffusion models have notably improved the perceptual quality of generated images in text-to-image synthesis tasks.

Image Generation

Towards Robust Out-of-Distribution Generalization Bounds via Sharpness

no code implementations11 Mar 2024 Yingtian Zou, Kenji Kawaguchi, Yingnan Liu, Jiashuo Liu, Mong-Li Lee, Wynne Hsu

To bridge this gap between optimization and OOD generalization, we study the effect of sharpness on how a model tolerates data change in domain shift which is usually captured by "robustness" in generalization.

Generalization Bounds Out-of-Distribution Generalization

How do Large Language Models Handle Multilingualism?

1 code implementation29 Feb 2024 Yiran Zhao, Wenxuan Zhang, Guizhen Chen, Kenji Kawaguchi, Lidong Bing

Based on observed language ratio shifts among layers and the relationships between network structures and certain capabilities, we hypothesize the LLM's multilingual workflow ($\texttt{MWork}$): LLMs initially understand the query, converting multilingual inputs into English for task-solving.

AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging

1 code implementation29 Feb 2024 Yiran Zhao, Wenxuan Zhang, Huiming Wang, Kenji Kawaguchi, Lidong Bing

In this paper, we acknowledge the mutual reliance between task ability and language ability and direct our attention toward the gap between the target language and the source language on tasks.

Cross-Lingual Transfer

The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling

no code implementations23 Feb 2024 Jiajun Ma, Shuchen Xue, Tianyang Hu, Wenjia Wang, Zhaoqiang Liu, Zhenguo Li, Zhi-Ming Ma, Kenji Kawaguchi

Surprisingly, the improvement persists when we increase the number of sampling steps and can even surpass the best result from EDM-2 (1. 58) with only 39 NFEs (1. 57).

Decoder Image Generation

Unsupervised Concept Discovery Mitigates Spurious Correlations

1 code implementation20 Feb 2024 Md Rifat Arefin, Yan Zhang, Aristide Baratin, Francesco Locatello, Irina Rish, Dianbo Liu, Kenji Kawaguchi

Models prone to spurious correlations in training data often produce brittle predictions and introduce unintended biases.

Representation Learning

Score-Based Physics-Informed Neural Networks for High-Dimensional Fokker-Planck Equations

no code implementations12 Feb 2024 Zheyuan Hu, Zhongqiang Zhang, George Em Karniadakis, Kenji Kawaguchi

The score function, defined as the gradient of the LL, plays a fundamental role in inferring LL and PDF and enables fast SDE sampling.

Towards 3D Molecule-Text Interpretation in Language Models

1 code implementation25 Jan 2024 Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, Qi Tian

Through 3D molecule-text alignment and 3D molecule-centric instruction tuning, 3D-MoLM establishes an integration of 3D molecular encoder and LM.

Instruction Following Language Modelling +2

The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline

no code implementations7 Jan 2024 Haonan Wang, Qianli Shen, Yao Tong, Yang Zhang, Kenji Kawaguchi

Our method strategically embeds connections between pieces of copyrighted information and text references in poisoning data while carefully dispersing that information, making the poisoning data inconspicuous when integrated into a clean dataset.

Backdoor Attack Data Poisoning +1

Simple Hierarchical Planning with Diffusion

no code implementations5 Jan 2024 Chang Chen, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sungjin Ahn

Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets.

Can AI Be as Creative as Humans?

no code implementations3 Jan 2024 Haonan Wang, James Zou, Michael Mozer, Anirudh Goyal, Alex Lamb, Linjun Zhang, Weijie J Su, Zhun Deng, Michael Qizhe Xie, Hannah Brown, Kenji Kawaguchi

With the rise of advanced generative AI models capable of tasks once reserved for human creativity, the study of AI's creative potential becomes imperative for its responsible development and application.

Hutchinson Trace Estimation for High-Dimensional and High-Order Physics-Informed Neural Networks

1 code implementation22 Dec 2023 Zheyuan Hu, Zekun Shi, George Em Karniadakis, Kenji Kawaguchi

We further showcase HTE's convergence to the original PINN loss and its unbiased behavior under specific conditions.

Prompt Optimization via Adversarial In-Context Learning

1 code implementation5 Dec 2023 Xuan Long Do, Yiran Zhao, Hannah Brown, Yuxi Xie, James Xu Zhao, Nancy F. Chen, Kenji Kawaguchi, Michael Shieh, Junxian He

We propose a new method, Adversarial In-Context Learning (adv-ICL), to optimize prompt for in-context learning (ICL) by employing one LLM as a generator, another as a discriminator, and a third as a prompt modifier.

Arithmetic Reasoning Data-to-Text Generation +3

VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models

1 code implementation CVPR 2024 Xiang Li, Qianli Shen, Kenji Kawaguchi

The booming use of text-to-image generative models has raised concerns about their high risk of producing copyright-infringing content.

Bias-Variance Trade-off in Physics-Informed Neural Networks with Randomized Smoothing for High-Dimensional PDEs

no code implementations26 Nov 2023 Zheyuan Hu, Zhouhao Yang, Yezhen Wang, George Em Karniadakis, Kenji Kawaguchi

To optimize the bias-variance trade-off, we combine the two approaches in a hybrid method that balances the rapid convergence of the biased version with the high accuracy of the unbiased version.

Computational Efficiency

Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules

1 code implementation NeurIPS 2023 Zhiyuan Liu, Yaorui Shi, An Zhang, Enzhi Zhang, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua

Our results show that a subgraph-level tokenizer and a sufficiently expressive decoder with remask decoding have a large impact on the encoder's representation learning.

Decoder Representation Learning +1

Self-Supervised Dataset Distillation for Transfer Learning

2 code implementations10 Oct 2023 Dong Bok Lee, Seanie Lee, Joonho Ko, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang

To achieve this, we also introduce the MSE between representations of the inner model and the self-supervised target model on the original full dataset for outer optimization.

Bilevel Optimization Dataset Distillation +4

Drug Discovery with Dynamic Goal-aware Fragments

1 code implementation2 Oct 2023 Seul Lee, Seanie Lee, Kenji Kawaguchi, Sung Ju Hwang

Additionally, the existing fragment-based generative models cannot update the fragment vocabulary with goal-aware fragments newly discovered during the generation.

Drug Discovery

On Copyright Risks of Text-to-Image Diffusion Models

no code implementations15 Sep 2023 Yang Zhang, Teoh Tze Tzun, Lim Wei Hern, Haonan Wang, Kenji Kawaguchi

Specifically, we introduce a data generation pipeline to systematically produce data for studying copyright in diffusion models.

A Dual-Perspective Approach to Evaluating Feature Attribution Methods

1 code implementation17 Aug 2023 Yawei Li, Yang Zhang, Kenji Kawaguchi, Ashkan Khakzar, Bernd Bischl, Mina Rezaei

We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.

Tackling the Curse of Dimensionality with Physics-Informed Neural Networks

1 code implementation23 Jul 2023 Zheyuan Hu, Khemraj Shukla, George Em Karniadakis, Kenji Kawaguchi

We demonstrate in various diverse tests that the proposed method can solve many notoriously hard high-dimensional PDEs, including the Hamilton-Jacobi-Bellman (HJB) and the Schr\"{o}dinger equations in tens of thousands of dimensions very fast on a single GPU using the PINNs mesh-free approach.

Multi-View Class Incremental Learning

no code implementations16 Jun 2023 Depeng Li, Tianqi Wang, Junwei Chen, Kenji Kawaguchi, Cheng Lian, Zhigang Zeng

Multi-view learning (MVL) has gained great success in integrating information from multiple perspectives of a dataset to improve downstream task performance.

class-incremental learning Class Incremental Learning +4

Fast Diffusion Model

1 code implementation12 Jun 2023 Zike Wu, Pan Zhou, Kenji Kawaguchi, Hanwang Zhang

In this paper, we propose a Fast Diffusion Model (FDM) to significantly speed up DMs from a stochastic optimization perspective for both faster training and sampling.

Image Generation

How Does Information Bottleneck Help Deep Learning?

1 code implementation30 May 2023 Kenji Kawaguchi, Zhun Deng, Xu Ji, Jiaoyang Huang

In this paper, we provide the first rigorous learning theory for justifying the benefit of information bottleneck in deep learning by mathematically relating information bottleneck to generalization errors.

Deep Learning Generalization Bounds +1

Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks

1 code implementation NeurIPS 2023 Minki Kang, Seanie Lee, Jinheon Baek, Kenji Kawaguchi, Sung Ju Hwang

Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge.

MedQA Memorization +1

Automatic Model Selection with Large Language Models for Reasoning

1 code implementation23 May 2023 James Xu Zhao, Yuxi Xie, Kenji Kawaguchi, Junxian He, Michael Qizhe Xie

Chain-of-Thought (CoT) and Program-Aided Language Models (PAL) represent two distinct reasoning methods, each with its own strengths.

Arithmetic Reasoning GSM8K +4

Boosting Visual-Language Models by Exploiting Hard Samples

1 code implementation9 May 2023 Haonan Wang, Minbin Huang, Runhui Huang, Lanqing Hong, Hang Xu, Tianyang Hu, Xiaodan Liang, Zhenguo Li, Hong Cheng, Kenji Kawaguchi

In this work, we present HELIP, a cost-effective strategy tailored to enhance the performance of existing CLIP models without the need for training a model from scratch or collecting additional data.

Retrieval Zero-Shot Learning

Self-Evaluation Guided Beam Search for Reasoning

no code implementations NeurIPS 2023 Yuxi Xie, Kenji Kawaguchi, Yiran Zhao, Xu Zhao, Min-Yen Kan, Junxian He, Qizhe Xie

Stochastic beam search balances exploitation and exploration of the search space with temperature-controlled randomness.

Arithmetic Reasoning GSM8K +3

Last-Layer Fairness Fine-tuning is Simple and Effective for Neural Networks

2 code implementations8 Apr 2023 Yuzhen Mao, Zhun Deng, Huaxiu Yao, Ting Ye, Kenji Kawaguchi, James Zou

As machine learning has been deployed ubiquitously across applications in modern data science, algorithmic fairness has become a great concern.

Fairness Open-Ended Question Answering +1

An Information-Theoretic Perspective on Variance-Invariance-Covariance Regularization

no code implementations1 Mar 2023 Ravid Shwartz-Ziv, Randall Balestriero, Kenji Kawaguchi, Tim G. J. Rudner, Yann Lecun

Variance-Invariance-Covariance Regularization (VICReg) is a self-supervised learning (SSL) method that has shown promising results on a variety of tasks.

Self-Supervised Learning Transfer Learning

Auxiliary Learning as an Asymmetric Bargaining Game

1 code implementation31 Jan 2023 Aviv Shamsian, Aviv Navon, Neta Glazer, Kenji Kawaguchi, Gal Chechik, Ethan Fetaya

Auxiliary learning is an effective method for enhancing the generalization capabilities of trained models, particularly when dealing with small datasets.

Auxiliary Learning

MixupE: Understanding and Improving Mixup from Directional Derivative Perspective

1 code implementation27 Dec 2022 Yingtian Zou, Vikas Verma, Sarthak Mittal, Wai Hoh Tang, Hieu Pham, Juho Kannala, Yoshua Bengio, Arno Solin, Kenji Kawaguchi

Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels.

Data Augmentation

Single-Pass Contrastive Learning Can Work for Both Homophilic and Heterophilic Graph

1 code implementation20 Nov 2022 Haonan Wang, Jieyu Zhang, Qi Zhu, Wei Huang, Kenji Kawaguchi, Xiaokui Xiao

To answer this question, we theoretically study the concentration property of features obtained by neighborhood aggregation on homophilic and heterophilic graphs, introduce the single-pass augmentation-free graph contrastive learning loss based on the property, and provide performance guarantees for the minimizer of the loss on downstream tasks.

Contrastive Learning

Neural Active Learning on Heteroskedastic Distributions

1 code implementation2 Nov 2022 Savya Khosla, Chew Kin Whye, Jordan T. Ash, Cyril Zhang, Kenji Kawaguchi, Alex Lamb

To this end, we demonstrate the catastrophic failure of these active learning algorithms on heteroskedastic distributions and propose a fine-tuning-based approach to mitigate these failures.

Active Learning

Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning

no code implementations1 Nov 2022 Riashat Islam, Hongyu Zang, Anirudh Goyal, Alex Lamb, Kenji Kawaguchi, Xin Li, Romain Laroche, Yoshua Bengio, Remi Tachet des Combes

Goal-conditioned reinforcement learning (RL) is a promising direction for training agents that are capable of solving multiple tasks and reach a diverse set of objectives.

reinforcement-learning Reinforcement Learning (RL)

TuneUp: A Simple Improved Training Strategy for Graph Neural Networks

no code implementations26 Oct 2022 Weihua Hu, Kaidi Cao, Kexin Huang, Edward W Huang, Karthik Subbian, Kenji Kawaguchi, Jure Leskovec

Extensive evaluation of TuneUp on five diverse GNN architectures, three types of prediction tasks, and both transductive and inductive settings shows that TuneUp significantly improves the performance of the base GNN on tail nodes, while often even improving the performance on head nodes.

Data Augmentation

GFlowOut: Dropout with Generative Flow Networks

no code implementations24 Oct 2022 Dianbo Liu, Moksh Jain, Bonaventure Dossou, Qianli Shen, Salem Lahlou, Anirudh Goyal, Nikolay Malkin, Chris Emezue, Dinghuai Zhang, Nadhir Hassen, Xu Ji, Kenji Kawaguchi, Yoshua Bengio

These methods face two important challenges: (a) the posterior distribution over masks can be highly multi-modal which can be difficult to approximate with standard variational inference and (b) it is not trivial to fully utilize sample-dependent information and correlation among dropout masks to improve posterior estimation.

Bayesian Inference Variational Inference

MGNNI: Multiscale Graph Neural Networks with Implicit Layers

1 code implementation15 Oct 2022 Juncheng Liu, Bryan Hooi, Kenji Kawaguchi, Xiaokui Xiao

Recently, implicit graph neural networks (GNNs) have been proposed to capture long-range dependencies in underlying graphs.

Graph Classification Graph Neural Network +1

Self-Distillation for Further Pre-training of Transformers

no code implementations30 Sep 2022 Seanie Lee, Minki Kang, Juho Lee, Sung Ju Hwang, Kenji Kawaguchi

Pre-training a large transformer model on a massive amount of unlabeled data and fine-tuning it on labeled datasets for diverse downstream tasks has proven to be a successful strategy, for a variety of vision and natural language processing tasks.

text-classification Text Classification

Scalable Set Encoding with Universal Mini-Batch Consistency and Unbiased Full Set Gradient Approximation

1 code implementation26 Aug 2022 Jeffrey Willette, Seanie Lee, Bruno Andreis, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang

Recent work on mini-batch consistency (MBC) for set functions has brought attention to the need for sequentially processing and aggregating chunks of a partitioned set while guaranteeing the same output for all partitions.

Point Cloud Classification text-classification +1

Robustness Implies Generalization via Data-Dependent Generalization Bounds

no code implementations27 Jun 2022 Kenji Kawaguchi, Zhun Deng, Kyle Luh, Jiaoyang Huang

This paper proves that robustness implies generalization via data-dependent generalization bounds.

Generalization Bounds

Set-based Meta-Interpolation for Few-Task Meta-Learning

no code implementations20 May 2022 Seanie Lee, Bruno Andreis, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang

Recently, several task augmentation methods have been proposed to tackle this issue using domain-specific knowledge to design augmentation techniques to densify the meta-training task distribution.

Bilevel Optimization Image Classification +6

Simplicial Embeddings in Self-Supervised Learning and Downstream Classification

1 code implementation1 Apr 2022 Samuel Lavoie, Christos Tsirigotis, Max Schwarzer, Ankit Vani, Michael Noukhovitch, Kenji Kawaguchi, Aaron Courville

Simplicial Embeddings (SEM) are representations learned through self-supervised learning (SSL), wherein a representation is projected into $L$ simplices of $V$ dimensions each using a softmax operation.

Classification Inductive Bias +1

EIGNN: Efficient Infinite-Depth Graph Neural Networks

1 code implementation NeurIPS 2021 Juncheng Liu, Kenji Kawaguchi, Bryan Hooi, Yiwei Wang, Xiaokui Xiao

Motivated by this limitation, we propose a GNN model with infinite depth, which we call Efficient Infinite-Depth Graph Neural Networks (EIGNN), to efficiently capture very long-range dependencies.

Multi-Task Learning as a Bargaining Game

4 code implementations2 Feb 2022 Aviv Navon, Aviv Shamsian, Idan Achituve, Haggai Maron, Kenji Kawaguchi, Gal Chechik, Ethan Fetaya

In this paper, we propose viewing the gradients combination step as a bargaining game, where tasks negotiate to reach an agreement on a joint direction of parameter update.

Multi-Task Learning

Adaptive Discrete Communication Bottlenecks with Dynamic Vector Quantization

no code implementations2 Feb 2022 Dianbo Liu, Alex Lamb, Xu Ji, Pascal Notsawo, Mike Mozer, Yoshua Bengio, Kenji Kawaguchi

Vector Quantization (VQ) is a method for discretizing latent representations and has become a major part of the deep learning toolkit.

Quantization reinforcement-learning +3

ExpertNet: A Symbiosis of Classification and Clustering

no code implementations17 Jan 2022 Shivin Srivastava, Kenji Kawaguchi, Vaibhav Rajan

We theoretically analyze the effect of clustering on its generalization gap, and empirically show that clustered latent representations from ExpertNet lead to disentangling the intrinsic structure and improvement in classification performance.

Classification Clustering +1

Training Free Graph Neural Networks for Graph Matching

1 code implementation14 Jan 2022 Zhiyuan Liu, Yixin Cao, Fuli Feng, Xiang Wang, Jie Tang, Kenji Kawaguchi, Tat-Seng Chua

We present a framework of Training Free Graph Matching (TFGM) to boost the performance of Graph Neural Networks (GNNs) based graph matching, providing a fast promising solution without training (training-free).

Entity Alignment Graph Matching +1

Noether Networks: Meta-Learning Useful Conserved Quantities

no code implementations NeurIPS 2021 Ferran Alet, Dylan Doblar, Allan Zhou, Joshua Tenenbaum, Kenji Kawaguchi, Chelsea Finn

Progress in machine learning (ML) stems from a combination of data availability, computational resources, and an appropriate encoding of inductive biases.

Meta-Learning Translation

Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization

no code implementations NeurIPS 2021 Clement Gehring, Kenji Kawaguchi, Jiaoyang Huang, Leslie Kaelbling

Estimating the per-state expected cumulative rewards is a critical aspect of reinforcement learning approaches, however the experience is obtained, but standard deep neural-network function-approximation methods are often inefficient in this setting.

Model-based Reinforcement Learning reinforcement-learning +1

Combined Scaling for Zero-shot Transfer Learning

no code implementations19 Nov 2021 Hieu Pham, Zihang Dai, Golnaz Ghiasi, Kenji Kawaguchi, Hanxiao Liu, Adams Wei Yu, Jiahui Yu, Yi-Ting Chen, Minh-Thang Luong, Yonghui Wu, Mingxing Tan, Quoc V. Le

Second, while increasing the dataset size and the model size has been the defacto method to improve the performance of deep learning models like BASIC, the effect of a large contrastive batch size on such contrastive-trained image-text models is not well-understood.

Classification Contrastive Learning +3

When Do Extended Physics-Informed Neural Networks (XPINNs) Improve Generalization?

no code implementations20 Sep 2021 Zheyuan Hu, Ameya D. Jagtap, George Em Karniadakis, Kenji Kawaguchi

Specifically, for general multi-layer PINNs and XPINNs, we first provide a prior generalization bound via the complexity of the target functions in the PDE problem, and a posterior generalization bound via the posterior matrix norms of the networks after optimization.

Meta-learning PINN loss functions

no code implementations12 Jul 2021 Apostolos F Psaros, Kenji Kawaguchi, George Em Karniadakis

In the computational examples, the meta-learned losses are employed at test time for addressing regression and PDE task distributions.

Meta-Learning

Discrete-Valued Neural Communication

no code implementations NeurIPS 2021 Dianbo Liu, Alex Lamb, Kenji Kawaguchi, Anirudh Goyal, Chen Sun, Michael Curtis Mozer, Yoshua Bengio

Deep learning has advanced from fully connected architectures to structured models organized into components, e. g., the transformer composed of positional elements, modular architectures divided into slots, and graph neural nets made up of nodes.

Quantization Systematic Generalization

Understanding Dynamics of Nonlinear Representation Learning and Its Application

no code implementations28 Jun 2021 Kenji Kawaguchi, Linjun Zhang, Zhun Deng

Representation learning allows us to automatically discover suitable representations from raw sensory data.

Representation Learning

Adversarial Training Helps Transfer Learning via Better Representations

no code implementations NeurIPS 2021 Zhun Deng, Linjun Zhang, Kailas Vodrahalli, Kenji Kawaguchi, James Zou

Recent works empirically demonstrate that adversarial training in the source data can improve the ability of models to transfer to new domains.

Transfer Learning

Sketch-Based Anomaly Detection in Streaming Graphs

1 code implementation8 Jun 2021 Siddharth Bhatia, Mohit Wadhwa, Kenji Kawaguchi, Neil Shah, Philip S. Yu, Bryan Hooi

This higher-order sketch has the useful property of preserving the dense subgraph structure (dense subgraphs in the input turn into dense submatrices in the data structure).

Anomaly Detection Intrusion Detection

MemStream: Memory-Based Streaming Anomaly Detection

1 code implementation7 Jun 2021 Siddharth Bhatia, Arjit Jain, Shivin Srivastava, Kenji Kawaguchi, Bryan Hooi

Given a stream of entries over time in a multi-dimensional data setting where concept drift is present, how can we detect anomalous activities?

Denoising Unsupervised Anomaly Detection

Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions

2 code implementations20 May 2021 Ameya D. Jagtap, Yeonjong Shin, Kenji Kawaguchi, George Em Karniadakis

We propose a new type of neural networks, Kronecker neural networks (KNNs), that form a general framework for neural networks with adaptive activation functions.

Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

no code implementations10 May 2021 Keyulu Xu, Mozhi Zhang, Stefanie Jegelka, Kenji Kawaguchi

Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution.

A Recipe for Global Convergence Guarantee in Deep Neural Networks

no code implementations12 Apr 2021 Kenji Kawaguchi, Qingyun Sun

Existing global convergence guarantees of (stochastic) gradient descent do not apply to practical deep networks in the practical regime of deep learning beyond the neural tangent kernel (NTK) regime.

Clustering Aware Classification for Risk Prediction and Subtyping in Clinical Data

1 code implementation23 Feb 2021 Shivin Srivastava, Siddharth Bhatia, Lingxiao Huang, Lim Jun Heng, Kenji Kawaguchi, Vaibhav Rajan

In data containing heterogeneous subpopulations, classification performance benefits from incorporating the knowledge of cluster structure in the classifier.

Classification Clustering +2

On the Theory of Implicit Deep Learning: Global Convergence with Implicit Layers

no code implementations15 Feb 2021 Kenji Kawaguchi

In this paper, we analyze the gradient dynamics of deep equilibrium models with nonlinearity only on weight matrices and non-convex objective functions of weights for regression and classification.

Relation

When and How Mixup Improves Calibration

no code implementations11 Feb 2021 Linjun Zhang, Zhun Deng, Kenji Kawaguchi, James Zou

In addition, we study how Mixup improves calibration in semi-supervised learning.

Data Augmentation

Dynamics of Deep Equilibrium Linear Models

no code implementations ICLR 2021 Kenji Kawaguchi

A deep equilibrium linear model is implicitly defined through an equilibrium point of an infinite sequence of computation.

Relation

Towards Domain-Agnostic Contrastive Learning

no code implementations9 Nov 2020 Vikas Verma, Minh-Thang Luong, Kenji Kawaguchi, Hieu Pham, Quoc V. Le

Despite recent success, most contrastive self-supervised learning methods are domain-specific, relying heavily on data augmentation techniques that require knowledge about a particular domain, such as image cropping and rotation.

Contrastive Learning Data Augmentation +3

How Does Mixup Help With Robustness and Generalization?

no code implementations ICLR 2021 Linjun Zhang, Zhun Deng, Kenji Kawaguchi, Amirata Ghorbani, James Zou

For robustness, we show that minimizing the Mixup loss corresponds to approximately minimizing an upper bound of the adversarial loss.

Data Augmentation

GraphMix: Improved Training of GNNs for Semi-Supervised Learning

1 code implementation25 Sep 2019 Vikas Verma, Meng Qu, Kenji Kawaguchi, Alex Lamb, Yoshua Bengio, Juho Kannala, Jian Tang

We present GraphMix, a regularization method for Graph Neural Network based semi-supervised object classification, whereby we propose to train a fully-connected network jointly with the graph neural network via parameter sharing and interpolation-based regularization.

Generalization Bounds Graph Attention +2

Gradient Descent Finds Global Minima for Generalizable Deep Neural Networks of Practical Sizes

no code implementations5 Aug 2019 Kenji Kawaguchi, Jiaoyang Huang

The theory developed in this paper only requires the practical degrees of over-parameterization unlike previous theories.

Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization

2 code implementations9 Jul 2019 Kenji Kawaguchi, Haihao Lu

The traditional approaches, such as (mini-batch) stochastic gradient descent (SGD), utilize an unbiased gradient estimator of the empirical average loss.

Stochastic Optimization

Every Local Minimum Value is the Global Minimum Value of Induced Model in Non-convex Machine Learning

no code implementations7 Apr 2019 Kenji Kawaguchi, Jiaoyang Huang, Leslie Pack Kaelbling

Furthermore, as special cases of our general results, this article improves or complements several state-of-the-art theoretical results on deep neural networks, deep residual networks, and overparameterized deep neural networks with a unified proof technique and novel geometric insights.

BIG-bench Machine Learning Representation Learning

Interpolation Consistency Training for Semi-Supervised Learning

4 code implementations9 Mar 2019 Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Arno Solin, Yoshua Bengio, David Lopez-Paz

We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm.

General Classification Semi-Supervised Image Classification

Eliminating all bad Local Minima from Loss Landscapes without even adding an Extra Unit

no code implementations12 Jan 2019 Jascha Sohl-Dickstein, Kenji Kawaguchi

Recent work has noted that all bad local minima can be removed from neural network loss landscapes, by adding a single unit with a particular parameterization.

Elimination of All Bad Local Minima in Deep Learning

no code implementations2 Jan 2019 Kenji Kawaguchi, Leslie Pack Kaelbling

At every local minimum of any deep neural network with these added neurons, the set of parameters of the original neural network (without added neurons) is guaranteed to be a global minimum of the original neural network.

Binary Classification Deep Learning +2

Effect of Depth and Width on Local Minima in Deep Learning

no code implementations20 Nov 2018 Kenji Kawaguchi, Jiaoyang Huang, Leslie Pack Kaelbling

In this paper, we analyze the effects of depth and width on the quality of local minima, without strong over-parameterization and simplification assumptions in the literature.

Depth with Nonlinearity Creates No Bad Local Minima in ResNets

no code implementations21 Oct 2018 Kenji Kawaguchi, Yoshua Bengio

In this paper, we prove that depth with nonlinearity creates no bad local minima in a type of arbitrarily deep ResNets with arbitrary nonlinear activation functions, in the sense that the values of all local minima are no worse than the global minimum value of corresponding classical machine-learning models, and are guaranteed to further improve via residual representations.

BIG-bench Machine Learning Open-Ended Question Answering

Generalization in Machine Learning via Analytical Learning Theory

2 code implementations21 Feb 2018 Kenji Kawaguchi, Yoshua Bengio, Vikas Verma, Leslie Pack Kaelbling

This paper introduces a novel measure-theoretic theory for machine learning that does not require statistical assumptions.

BIG-bench Machine Learning Deep Learning +3

Theory of Deep Learning III: explaining the non-overfitting puzzle

no code implementations30 Dec 2017 Tomaso Poggio, Kenji Kawaguchi, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Xavier Boix, Jack Hidary, Hrushikesh Mhaskar

In this note, we show that the dynamics associated to gradient descent minimization of nonlinear networks is topologically equivalent, near the asymptotically stable minima of the empirical error, to linear gradient system in a quadratic potential with a degenerate (for square loss) or almost degenerate (for logistic or crossentropy loss) Hessian.

Deep Learning General Classification

Generalization in Deep Learning

no code implementations16 Oct 2017 Kenji Kawaguchi, Leslie Pack Kaelbling, Yoshua Bengio

This paper provides theoretical insights into why and how deep learning can generalize well, despite its large capacity, complexity, possible algorithmic instability, nonrobustness, and sharp minima, responding to an open question in the literature.

Deep Learning Open-Ended Question Answering

Deep Semi-Random Features for Nonlinear Function Approximation

1 code implementation28 Feb 2017 Kenji Kawaguchi, Bo Xie, Vikas Verma, Le Song

For deep models, with no unrealistic assumptions, we prove universal approximation ability, a lower bound on approximation error, a partial optimization guarantee, and a generalization bound.

Depth Creates No Bad Local Minima

no code implementations27 Feb 2017 Haihao Lu, Kenji Kawaguchi

In deep learning, \textit{depth}, as well as \textit{nonlinearity}, create non-convex loss surfaces.

Streaming Normalization: Towards Simpler and More Biologically-plausible Normalizations for Online and Recurrent Learning

no code implementations19 Oct 2016 Qianli Liao, Kenji Kawaguchi, Tomaso Poggio

We systematically explored a spectrum of normalization algorithms related to Batch Normalization (BN) and propose a generalized formulation that simultaneously solves two major limitations of BN: (1) online learning and (2) recurrent learning.

Global Continuous Optimization with Error Bound and Fast Convergence

no code implementations17 Jul 2016 Kenji Kawaguchi, Yu Maruyama, Xiaoyu Zheng

This paper proposes a new global optimization algorithm, called Locally Oriented Global Optimization (LOGO), to aim for both fast convergence in practice and finite-time error bound in theory.

Management

Deep Learning without Poor Local Minima

1 code implementation NeurIPS 2016 Kenji Kawaguchi

In this paper, we prove a conjecture published in 1989 and also partially address an open problem announced at the Conference on Learning Theory (COLT) 2015.

Deep Learning Learning Theory

Bounded Optimal Exploration in MDP

no code implementations5 Apr 2016 Kenji Kawaguchi

Within the framework of probably approximately correct Markov decision processes (PAC-MDP), much theoretical work has focused on methods to attain near optimality after a relatively long period of learning and exploration.

Bayesian Optimization with Exponential Convergence

no code implementations NeurIPS 2015 Kenji Kawaguchi, Leslie Pack Kaelbling, Tomás Lozano-Pérez

This paper presents a Bayesian optimization method with exponential convergence without the need of auxiliary optimization and without the delta-cover sampling.

Bayesian Optimization

Cannot find the paper you are looking for? You can Submit a new open access paper.