no code implementations • 3 Dec 2024 • Yang Zhang, Er Jin, Yanfei Dong, Ashkan Khakzar, Philip Torr, Johannes Stegmaier, Kenji Kawaguchi
Diffusion models have achieved impressive advancements in various vision tasks.
1 code implementation • 20 Nov 2024 • Haonan Wang, Qian Liu, Chao Du, Tongyao Zhu, Cunxiao Du, Kenji Kawaguchi, Tianyu Pang
To address this, we develop AnchorAttention, a plug-and-play attention method that alleviates numerical issues caused by BFloat16, improves long-context capabilities, and speeds up training.
no code implementations • 8 Nov 2024 • Esther Gan, Yiran Zhao, Liying Cheng, Yancan Mao, Anirudh Goyal, Kenji Kawaguchi, Min-Yen Kan, Michael Shieh
It shows that LLMs are sensitive to minimal adversarial typographical changes.
1 code implementation • 1 Nov 2024 • Do Xuan Long, Duong Ngoc Yen, Anh Tuan Luu, Kenji Kawaguchi, Min-Yen Kan, Nancy F. Chen
We present Multi-expert Prompting, a novel enhancement of ExpertPrompting (Xu et al., 2023), designed to improve the large language model (LLM) generation.
no code implementations • 22 Sep 2024 • Yang Zhang, Yanfei Dong, Kenji Kawaguchi
In this study, we advance the understanding of LLM by investigating the significance of individual layers in LLMs.
no code implementations • 5 Sep 2024 • Zheyuan Hu, Nazanin Ahmadi Daryakenari, Qianli Shen, Kenji Kawaguchi, George Em Karniadakis
We demonstrate Mamba's superior performance in both interpolation and challenging extrapolation tasks.
1 code implementation • 16 Aug 2024 • Do Xuan Long, Hai Nguyen Ngoc, Tiviatis Sim, Hieu Dao, Shafiq Joty, Kenji Kawaguchi, Nancy F. Chen, Min-Yen Kan
We present the first systematic evaluation examining format bias in performance of large language models (LLMs).
no code implementations • 3 Jul 2024 • Leon Lin, Hannah Brown, Kenji Kawaguchi, Michael Shieh
When LLMs are deployed in sensitive, human-facing settings, it is crucial that they do not output unsafe, biased, or privacy-violating outputs.
1 code implementation • 3 Jul 2024 • Hannah Brown, Leon Lin, Kenji Kawaguchi, Michael Shieh
We introduce a defense against adversarial attacks on LLMs utilizing self-evaluation.
no code implementations • 20 Jun 2024 • Qianli Shen, Yezhen Wang, Zhouhao Yang, Xiang Li, Haonan Wang, Yang Zhang, Jonathan Scarlett, Zhanxing Zhu, Kenji Kawaguchi
Bi-level optimization (BO) has become a fundamental mathematical framework for addressing hierarchical machine learning problems.
no code implementations • 17 Jun 2024 • Zheyuan Hu, Zhongqiang Zhang, George Em Karniadakis, Kenji Kawaguchi
We introduce an innovative approach for solving high-dimensional Fokker-Planck-L\'evy (FPL) equations in modeling non-Brownian processes across disciplines such as physics, finance, and ecology.
no code implementations • 17 Jun 2024 • Zheyuan Hu, Kenji Kawaguchi, Zhongqiang Zhang, George Em Karniadakis
We validate our methods on various forward and inverse problems of fractional and tempered fractional PDEs, scaling up to 100, 000 dimensions.
1 code implementation • 10 Jun 2024 • Chang Chen, Junyeob Baek, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sungjin Ahn
At the low level, we used a Q-learning based approach called the Q-Performer to accomplish these sub-goals.
no code implementations • 5 Jun 2024 • Brian K Chen, Tianyang Hu, Hui Jin, Hwee Kuan Lee, Kenji Kawaguchi
We further suggest how our method can be adapted to achieve cheap approximate conversion of ICL tokens, even in regular transformer networks that are not linearized.
no code implementations • 28 May 2024 • Seanie Lee, Minsu Kim, Lynn Cherif, David Dobre, Juho Lee, Sung Ju Hwang, Kenji Kawaguchi, Gauthier Gidel, Yoshua Bengio, Nikolay Malkin, Moksh Jain
Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe and responsible deployment of large language models (LLMs).
no code implementations • 28 May 2024 • Yang Zhang, Yawei Li, Xinpeng Wang, Qianli Shen, Barbara Plank, Bernd Bischl, Mina Rezaei, Kenji Kawaguchi
Overparametrized transformer networks are the state-of-the-art architecture for Large Language Models (LLMs).
1 code implementation • 23 May 2024 • Zhiyuan Liu, Yaorui Shi, An Zhang, Sihang Li, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua
To resolve the challenges above, we propose a new pretraining method, ReactXT, for reaction-text modeling, and a new dataset, OpenExp, for experimental procedure prediction.
1 code implementation • 21 May 2024 • Zhiyuan Liu, An Zhang, Hao Fei, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua
ProtT3 empowers an LM to understand protein sequences of amino acids by incorporating a PLM as its protein understanding module, enabling effective protein-to-text generation.
2 code implementations • 1 May 2024 • Yuxi Xie, Anirudh Goyal, Wenyue Zheng, Min-Yen Kan, Timothy P. Lillicrap, Kenji Kawaguchi, Michael Shieh
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process inspired by the successful strategy employed by AlphaZero.
1 code implementation • 22 Apr 2024 • Shihao Zhang, Kenji Kawaguchi, Angela Yao
Based on these two connections, we introduce PH-Reg, a regularizer specific to regression that matches the intrinsic dimension and topology of the feature space with the target space.
1 code implementation • 11 Mar 2024 • Yang Zhang, Teoh Tze Tzun, Lim Wei Hern, Tiviatis Sim, Kenji Kawaguchi
Recent advancements in diffusion models have notably improved the perceptual quality of generated images in text-to-image synthesis tasks.
no code implementations • 11 Mar 2024 • Yingtian Zou, Kenji Kawaguchi, Yingnan Liu, Jiashuo Liu, Mong-Li Lee, Wynne Hsu
To bridge this gap between optimization and OOD generalization, we study the effect of sharpness on how a model tolerates data change in domain shift which is usually captured by "robustness" in generalization.
1 code implementation • 2 Mar 2024 • Yiran Zhao, Wenyue Zheng, Tianle Cai, Xuan Long Do, Kenji Kawaguchi, Anirudh Goyal, Michael Shieh
Safety of Large Language Models (LLMs) has become a critical issue given their rapid progresses.
1 code implementation • 29 Feb 2024 • Yiran Zhao, Wenxuan Zhang, Guizhen Chen, Kenji Kawaguchi, Lidong Bing
Based on observed language ratio shifts among layers and the relationships between network structures and certain capabilities, we hypothesize the LLM's multilingual workflow ($\texttt{MWork}$): LLMs initially understand the query, converting multilingual inputs into English for task-solving.
1 code implementation • 29 Feb 2024 • Yiran Zhao, Wenxuan Zhang, Huiming Wang, Kenji Kawaguchi, Lidong Bing
In this paper, we acknowledge the mutual reliance between task ability and language ability and direct our attention toward the gap between the target language and the source language on tasks.
no code implementations • 26 Feb 2024 • Xuantong Liu, Tianyang Hu, Wenjia Wang, Kenji Kawaguchi, Yuan YAO
In this work, we aim to address this alignment challenge for conditional generation tasks.
no code implementations • 23 Feb 2024 • Jiajun Ma, Shuchen Xue, Tianyang Hu, Wenjia Wang, Zhaoqiang Liu, Zhenguo Li, Zhi-Ming Ma, Kenji Kawaguchi
Surprisingly, the improvement persists when we increase the number of sampling steps and can even surpass the best result from EDM-2 (1. 58) with only 39 NFEs (1. 57).
1 code implementation • 20 Feb 2024 • Md Rifat Arefin, Yan Zhang, Aristide Baratin, Francesco Locatello, Irina Rish, Dianbo Liu, Kenji Kawaguchi
Models prone to spurious correlations in training data often produce brittle predictions and introduce unintended biases.
no code implementations • 12 Feb 2024 • Zheyuan Hu, Zhongqiang Zhang, George Em Karniadakis, Kenji Kawaguchi
The score function, defined as the gradient of the LL, plays a fundamental role in inferring LL and PDF and enables fast SDE sampling.
1 code implementation • 25 Jan 2024 • Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, Qi Tian
Through 3D molecule-text alignment and 3D molecule-centric instruction tuning, 3D-MoLM establishes an integration of 3D molecular encoder and LM.
no code implementations • 17 Jan 2024 • Depeng Li, Tianqi Wang, Junwei Chen, Qining Ren, Kenji Kawaguchi, Zhigang Zeng
Deep neural networks are susceptible to catastrophic forgetting when trained on sequential tasks.
no code implementations • 7 Jan 2024 • Haonan Wang, Qianli Shen, Yao Tong, Yang Zhang, Kenji Kawaguchi
Our method strategically embeds connections between pieces of copyrighted information and text references in poisoning data while carefully dispersing that information, making the poisoning data inconspicuous when integrated into a clean dataset.
no code implementations • 5 Jan 2024 • Chang Chen, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sungjin Ahn
Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets.
no code implementations • 3 Jan 2024 • Haonan Wang, James Zou, Michael Mozer, Anirudh Goyal, Alex Lamb, Linjun Zhang, Weijie J Su, Zhun Deng, Michael Qizhe Xie, Hannah Brown, Kenji Kawaguchi
With the rise of advanced generative AI models capable of tasks once reserved for human creativity, the study of AI's creative potential becomes imperative for its responsible development and application.
1 code implementation • 22 Dec 2023 • Zheyuan Hu, Zekun Shi, George Em Karniadakis, Kenji Kawaguchi
We further showcase HTE's convergence to the original PINN loss and its unbiased behavior under specific conditions.
1 code implementation • 5 Dec 2023 • Xuan Long Do, Yiran Zhao, Hannah Brown, Yuxi Xie, James Xu Zhao, Nancy F. Chen, Kenji Kawaguchi, Michael Shieh, Junxian He
We propose a new method, Adversarial In-Context Learning (adv-ICL), to optimize prompt for in-context learning (ICL) by employing one LLM as a generator, another as a discriminator, and a third as a prompt modifier.
no code implementations • 1 Dec 2023 • Kerui Gu, Zhihao LI, Shiyong Liu, Jianzhuang Liu, Songcen Xu, Youliang Yan, Michael Bi Mi, Kenji Kawaguchi, Angela Yao
Estimating 3D rotations is a common procedure for 3D computer vision.
Ranked #15 on 3D Human Pose Estimation on 3DPW
1 code implementation • CVPR 2024 • Xiang Li, Qianli Shen, Kenji Kawaguchi
The booming use of text-to-image generative models has raised concerns about their high risk of producing copyright-infringing content.
no code implementations • 26 Nov 2023 • Zheyuan Hu, Zhouhao Yang, Yezhen Wang, George Em Karniadakis, Kenji Kawaguchi
To optimize the bias-variance trade-off, we combine the two approaches in a hybrid method that balances the rapid convergence of the biased version with the high accuracy of the unbiased version.
no code implementations • 14 Nov 2023 • Do Xuan Long, Kenji Kawaguchi, Min-Yen Kan, Nancy F. Chen
Reasoning and predicting human opinions with large language models (LLMs) is essential yet challenging.
1 code implementation • NeurIPS 2023 • Zhiyuan Liu, Yaorui Shi, An Zhang, Enzhi Zhang, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua
Our results show that a subgraph-level tokenizer and a sufficiently expressive decoder with remask decoding have a large impact on the encoder's representation learning.
1 code implementation • 19 Oct 2023 • Zhiyuan Liu, Sihang Li, Yanchen Luo, Hao Fei, Yixin Cao, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua
MolCA enables an LM (e. g., Galactica) to understand both text- and graph-based molecular contents via the cross-modal projector.
Ranked #5 on Molecule Captioning on ChEBI-20
no code implementations • 10 Oct 2023 • Yang Zhang, Yawei Li, Hannah Brown, Mina Rezaei, Bernd Bischl, Philip Torr, Ashkan Khakzar, Kenji Kawaguchi
Feature attribution explains neural network outputs by identifying relevant input features.
2 code implementations • 10 Oct 2023 • Dong Bok Lee, Seanie Lee, Joonho Ko, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang
To achieve this, we also introduce the MSE between representations of the inner model and the self-supervised target model on the original full dataset for outer optimization.
1 code implementation • 2 Oct 2023 • Seul Lee, Seanie Lee, Kenji Kawaguchi, Sung Ju Hwang
Additionally, the existing fragment-based generative models cannot update the fragment vocabulary with goal-aware fragments newly discovered during the generation.
no code implementations • 15 Sep 2023 • Yang Zhang, Teoh Tze Tzun, Lim Wei Hern, Haonan Wang, Kenji Kawaguchi
Specifically, we introduce a data generation pipeline to systematically produce data for studying copyright in diffusion models.
1 code implementation • 17 Aug 2023 • Yawei Li, Yang Zhang, Kenji Kawaguchi, Ashkan Khakzar, Bernd Bischl, Mina Rezaei
We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.
1 code implementation • 23 Jul 2023 • Zheyuan Hu, Khemraj Shukla, George Em Karniadakis, Kenji Kawaguchi
We demonstrate in various diverse tests that the proposed method can solve many notoriously hard high-dimensional PDEs, including the Hamilton-Jacobi-Bellman (HJB) and the Schr\"{o}dinger equations in tens of thousands of dimensions very fast on a single GPU using the PINNs mesh-free approach.
no code implementations • 18 Jun 2023 • Depeng Li, Tianqi Wang, Bingrong Xu, Kenji Kawaguchi, Zhigang Zeng, Ponnuthurai Nagaratnam Suganthan
Continual learning can incrementally absorb new concepts without interfering with previously learned knowledge.
no code implementations • 16 Jun 2023 • Depeng Li, Tianqi Wang, Junwei Chen, Kenji Kawaguchi, Cheng Lian, Zhigang Zeng
Multi-view learning (MVL) has gained great success in integrating information from multiple perspectives of a dataset to improve downstream task performance.
1 code implementation • 12 Jun 2023 • Zike Wu, Pan Zhou, Kenji Kawaguchi, Hanwang Zhang
In this paper, we propose a Fast Diffusion Model (FDM) to significantly speed up DMs from a stochastic optimization perspective for both faster training and sampling.
1 code implementation • 30 May 2023 • Kenji Kawaguchi, Zhun Deng, Xu Ji, Jiaoyang Huang
In this paper, we provide the first rigorous learning theory for justifying the benefit of information bottleneck in deep learning by mathematically relating information bottleneck to generalization errors.
1 code implementation • NeurIPS 2023 • Minki Kang, Seanie Lee, Jinheon Baek, Kenji Kawaguchi, Sung Ju Hwang
Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge.
1 code implementation • 23 May 2023 • James Xu Zhao, Yuxi Xie, Kenji Kawaguchi, Junxian He, Michael Qizhe Xie
Chain-of-Thought (CoT) and Program-Aided Language Models (PAL) represent two distinct reasoning methods, each with its own strengths.
Ranked #2 on Math Word Problem Solving on SVAMP
1 code implementation • 9 May 2023 • Haonan Wang, Minbin Huang, Runhui Huang, Lanqing Hong, Hang Xu, Tianyang Hu, Xiaodan Liang, Zhenguo Li, Hong Cheng, Kenji Kawaguchi
In this work, we present HELIP, a cost-effective strategy tailored to enhance the performance of existing CLIP models without the need for training a model from scratch or collecting additional data.
no code implementations • NeurIPS 2023 • Yuxi Xie, Kenji Kawaguchi, Yiran Zhao, Xu Zhao, Min-Yen Kan, Junxian He, Qizhe Xie
Stochastic beam search balances exploitation and exploration of the search space with temperature-controlled randomness.
2 code implementations • 8 Apr 2023 • Yuzhen Mao, Zhun Deng, Huaxiu Yao, Ting Ye, Kenji Kawaguchi, James Zou
As machine learning has been deployed ubiquitously across applications in modern data science, algorithmic fairness has become a great concern.
no code implementations • 1 Mar 2023 • Ravid Shwartz-Ziv, Randall Balestriero, Kenji Kawaguchi, Tim G. J. Rudner, Yann Lecun
Variance-Invariance-Covariance Regularization (VICReg) is a self-supervised learning (SSL) method that has shown promising results on a variety of tasks.
no code implementations • 1 Mar 2023 • Tianbo Li, Min Lin, Zheyuan Hu, Kunhao Zheng, Giovanni Vignale, Kenji Kawaguchi, A. H. Castro Neto, Kostya S. Novoselov, Shuicheng Yan
Kohn-Sham Density Functional Theory (KS-DFT) has been traditionally solved by the Self-Consistent Field (SCF) method.
1 code implementation • 31 Jan 2023 • Aviv Shamsian, Aviv Navon, Neta Glazer, Kenji Kawaguchi, Gal Chechik, Ethan Fetaya
Auxiliary learning is an effective method for enhancing the generalization capabilities of trained models, particularly when dealing with small datasets.
1 code implementation • 27 Dec 2022 • Yingtian Zou, Vikas Verma, Sarthak Mittal, Wai Hoh Tang, Hieu Pham, Juho Kannala, Yoshua Bengio, Arno Solin, Kenji Kawaguchi
Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels.
1 code implementation • 20 Nov 2022 • Haonan Wang, Jieyu Zhang, Qi Zhu, Wei Huang, Kenji Kawaguchi, Xiaokui Xiao
To answer this question, we theoretically study the concentration property of features obtained by neighborhood aggregation on homophilic and heterophilic graphs, introduce the single-pass augmentation-free graph contrastive learning loss based on the property, and provide performance guarantees for the minimizer of the loss on downstream tasks.
1 code implementation • 16 Nov 2022 • Zheyuan Hu, Ameya D. Jagtap, George Em Karniadakis, Kenji Kawaguchi
We also show cases where XPINN is already better than PINN, so APINN can still slightly improve XPINN.
1 code implementation • 2 Nov 2022 • Savya Khosla, Chew Kin Whye, Jordan T. Ash, Cyril Zhang, Kenji Kawaguchi, Alex Lamb
To this end, we demonstrate the catastrophic failure of these active learning algorithms on heteroskedastic distributions and propose a fine-tuning-based approach to mitigate these failures.
no code implementations • 1 Nov 2022 • Riashat Islam, Hongyu Zang, Anirudh Goyal, Alex Lamb, Kenji Kawaguchi, Xin Li, Romain Laroche, Yoshua Bengio, Remi Tachet des Combes
Goal-conditioned reinforcement learning (RL) is a promising direction for training agents that are capable of solving multiple tasks and reach a diverse set of objectives.
no code implementations • 26 Oct 2022 • Weihua Hu, Kaidi Cao, Kexin Huang, Edward W Huang, Karthik Subbian, Kenji Kawaguchi, Jure Leskovec
Extensive evaluation of TuneUp on five diverse GNN architectures, three types of prediction tasks, and both transductive and inductive settings shows that TuneUp significantly improves the performance of the base GNN on tail nodes, while often even improving the performance on head nodes.
no code implementations • 24 Oct 2022 • Dianbo Liu, Moksh Jain, Bonaventure Dossou, Qianli Shen, Salem Lahlou, Anirudh Goyal, Nikolay Malkin, Chris Emezue, Dinghuai Zhang, Nadhir Hassen, Xu Ji, Kenji Kawaguchi, Yoshua Bengio
These methods face two important challenges: (a) the posterior distribution over masks can be highly multi-modal which can be difficult to approximate with standard variational inference and (b) it is not trivial to fully utilize sample-dependent information and correlation among dropout masks to improve posterior estimation.
1 code implementation • 15 Oct 2022 • Juncheng Liu, Bryan Hooi, Kenji Kawaguchi, Xiaokui Xiao
Recently, implicit graph neural networks (GNNs) have been proposed to capture long-range dependencies in underlying graphs.
no code implementations • 30 Sep 2022 • Seanie Lee, Minki Kang, Juho Lee, Sung Ju Hwang, Kenji Kawaguchi
Pre-training a large transformer model on a massive amount of unlabeled data and fine-tuning it on labeled datasets for diverse downstream tasks has proven to be a successful strategy, for a variety of vision and natural language processing tasks.
1 code implementation • 26 Aug 2022 • Jeffrey Willette, Seanie Lee, Bruno Andreis, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang
Recent work on mini-batch consistency (MBC) for set functions has brought attention to the need for sequentially processing and aggregating chunks of a partitioned set while guaranteeing the same output for all partitions.
1 code implementation • 22 Jul 2022 • Frederik Träuble, Anirudh Goyal, Nasim Rahaman, Michael Mozer, Kenji Kawaguchi, Yoshua Bengio, Bernhard Schölkopf
Deep neural networks perform well on classification tasks where data streams are i. i. d.
no code implementations • 27 Jun 2022 • Kenji Kawaguchi, Zhun Deng, Kyle Luh, Jiaoyang Huang
This paper proves that robustness implies generalization via data-dependent generalization bounds.
no code implementations • 20 May 2022 • Seanie Lee, Bruno Andreis, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang
Recently, several task augmentation methods have been proposed to tackle this issue using domain-specific knowledge to design augmentation techniques to densify the meta-training task distribution.
1 code implementation • 1 Apr 2022 • Samuel Lavoie, Christos Tsirigotis, Max Schwarzer, Ankit Vani, Michael Noukhovitch, Kenji Kawaguchi, Aaron Courville
Simplicial Embeddings (SEM) are representations learned through self-supervised learning (SSL), wherein a representation is projected into $L$ simplices of $V$ dimensions each using a softmax operation.
1 code implementation • NeurIPS 2021 • Juncheng Liu, Kenji Kawaguchi, Bryan Hooi, Yiwei Wang, Xiaokui Xiao
Motivated by this limitation, we propose a GNN model with infinite depth, which we call Efficient Infinite-Depth Graph Neural Networks (EIGNN), to efficiently capture very long-range dependencies.
4 code implementations • 2 Feb 2022 • Aviv Navon, Aviv Shamsian, Idan Achituve, Haggai Maron, Kenji Kawaguchi, Gal Chechik, Ethan Fetaya
In this paper, we propose viewing the gradients combination step as a bargaining game, where tasks negotiate to reach an agreement on a joint direction of parameter update.
Ranked #2 on Multi-Task Learning on Cityscapes test
no code implementations • 2 Feb 2022 • Dianbo Liu, Alex Lamb, Xu Ji, Pascal Notsawo, Mike Mozer, Yoshua Bengio, Kenji Kawaguchi
Vector Quantization (VQ) is a method for discretizing latent representations and has become a major part of the deep learning toolkit.
no code implementations • 17 Jan 2022 • Shivin Srivastava, Kenji Kawaguchi, Vaibhav Rajan
We theoretically analyze the effect of clustering on its generalization gap, and empirically show that clustered latent representations from ExpertNet lead to disentangling the intrinsic structure and improvement in classification performance.
1 code implementation • 14 Jan 2022 • Zhiyuan Liu, Yixin Cao, Fuli Feng, Xiang Wang, Jie Tang, Kenji Kawaguchi, Tat-Seng Chua
We present a framework of Training Free Graph Matching (TFGM) to boost the performance of Graph Neural Networks (GNNs) based graph matching, providing a fast promising solution without training (training-free).
no code implementations • NeurIPS 2021 • Ferran Alet, Dylan Doblar, Allan Zhou, Joshua Tenenbaum, Kenji Kawaguchi, Chelsea Finn
Progress in machine learning (ML) stems from a combination of data availability, computational resources, and an appropriate encoding of inductive biases.
no code implementations • NeurIPS 2021 • Clement Gehring, Kenji Kawaguchi, Jiaoyang Huang, Leslie Kaelbling
Estimating the per-state expected cumulative rewards is a critical aspect of reinforcement learning approaches, however the experience is obtained, but standard deep neural-network function-approximation methods are often inefficient in this setting.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 19 Nov 2021 • Hieu Pham, Zihang Dai, Golnaz Ghiasi, Kenji Kawaguchi, Hanxiao Liu, Adams Wei Yu, Jiahui Yu, Yi-Ting Chen, Minh-Thang Luong, Yonghui Wu, Mingxing Tan, Quoc V. Le
Second, while increasing the dataset size and the model size has been the defacto method to improve the performance of deep learning models like BASIC, the effect of a large contrastive batch size on such contrastive-trained image-text models is not well-understood.
no code implementations • 20 Sep 2021 • Zheyuan Hu, Ameya D. Jagtap, George Em Karniadakis, Kenji Kawaguchi
Specifically, for general multi-layer PINNs and XPINNs, we first provide a prior generalization bound via the complexity of the target functions in the PDE problem, and a posterior generalization bound via the posterior matrix norms of the networks after optimization.
no code implementations • 12 Jul 2021 • Apostolos F Psaros, Kenji Kawaguchi, George Em Karniadakis
In the computational examples, the meta-learned losses are employed at test time for addressing regression and PDE task distributions.
no code implementations • NeurIPS 2021 • Dianbo Liu, Alex Lamb, Kenji Kawaguchi, Anirudh Goyal, Chen Sun, Michael Curtis Mozer, Yoshua Bengio
Deep learning has advanced from fully connected architectures to structured models organized into components, e. g., the transformer composed of positional elements, modular architectures divided into slots, and graph neural nets made up of nodes.
no code implementations • 28 Jun 2021 • Kenji Kawaguchi, Linjun Zhang, Zhun Deng
Representation learning allows us to automatically discover suitable representations from raw sensory data.
no code implementations • NeurIPS 2021 • Zhun Deng, Linjun Zhang, Kailas Vodrahalli, Kenji Kawaguchi, James Zou
Recent works empirically demonstrate that adversarial training in the source data can improve the ability of models to transfer to new domains.
1 code implementation • 8 Jun 2021 • Siddharth Bhatia, Mohit Wadhwa, Kenji Kawaguchi, Neil Shah, Philip S. Yu, Bryan Hooi
This higher-order sketch has the useful property of preserving the dense subgraph structure (dense subgraphs in the input turn into dense submatrices in the data structure).
1 code implementation • 7 Jun 2021 • Siddharth Bhatia, Arjit Jain, Shivin Srivastava, Kenji Kawaguchi, Bryan Hooi
Given a stream of entries over time in a multi-dimensional data setting where concept drift is present, how can we detect anomalous activities?
2 code implementations • 20 May 2021 • Ameya D. Jagtap, Yeonjong Shin, Kenji Kawaguchi, George Em Karniadakis
We propose a new type of neural networks, Kronecker neural networks (KNNs), that form a general framework for neural networks with adaptive activation functions.
no code implementations • 10 May 2021 • Keyulu Xu, Mozhi Zhang, Stefanie Jegelka, Kenji Kawaguchi
Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution.
no code implementations • 12 Apr 2021 • Kenji Kawaguchi, Qingyun Sun
Existing global convergence guarantees of (stochastic) gradient descent do not apply to practical deep networks in the practical regime of deep learning beyond the neural tangent kernel (NTK) regime.
1 code implementation • 23 Feb 2021 • Shivin Srivastava, Siddharth Bhatia, Lingxiao Huang, Lim Jun Heng, Kenji Kawaguchi, Vaibhav Rajan
In data containing heterogeneous subpopulations, classification performance benefits from incorporating the knowledge of cluster structure in the classifier.
no code implementations • 15 Feb 2021 • Kenji Kawaguchi
In this paper, we analyze the gradient dynamics of deep equilibrium models with nonlinearity only on weight matrices and non-convex objective functions of weights for regression and classification.
no code implementations • 11 Feb 2021 • Linjun Zhang, Zhun Deng, Kenji Kawaguchi, James Zou
In addition, we study how Mixup improves calibration in semi-supervised learning.
no code implementations • ICLR 2021 • Kenji Kawaguchi
A deep equilibrium linear model is implicitly defined through an equilibrium point of an infinite sequence of computation.
no code implementations • 9 Nov 2020 • Vikas Verma, Minh-Thang Luong, Kenji Kawaguchi, Hieu Pham, Quoc V. Le
Despite recent success, most contrastive self-supervised learning methods are domain-specific, relying heavily on data augmentation techniques that require knowledge about a particular domain, such as image cropping and rotation.
no code implementations • ICLR 2021 • Linjun Zhang, Zhun Deng, Kenji Kawaguchi, Amirata Ghorbani, James Zou
For robustness, we show that minimizing the Mixup loss corresponds to approximately minimizing an upper bound of the adversarial loss.
no code implementations • NeurIPS 2021 • Ferran Alet, Maria Bauza, Kenji Kawaguchi, Nurullah Giray Kuru, Tomas Lozano-Perez, Leslie Pack Kaelbling
Adding auxiliary losses to the main objective function is a general way of encoding biases that can help networks learn better representations.
no code implementations • 25 Sep 2019 • Ameya D. Jagtap, Kenji Kawaguchi, George Em. Karniadakis
Furthermore, the proposed methods with the slope recovery are shown to accelerate the training process.
1 code implementation • 25 Sep 2019 • Vikas Verma, Meng Qu, Kenji Kawaguchi, Alex Lamb, Yoshua Bengio, Juho Kannala, Jian Tang
We present GraphMix, a regularization method for Graph Neural Network based semi-supervised object classification, whereby we propose to train a fully-connected network jointly with the graph neural network via parameter sharing and interpolation-based regularization.
Ranked #1 on Node Classification on Pubmed random partition
no code implementations • 5 Aug 2019 • Kenji Kawaguchi, Jiaoyang Huang
The theory developed in this paper only requires the practical degrees of over-parameterization unlike previous theories.
2 code implementations • 9 Jul 2019 • Kenji Kawaguchi, Haihao Lu
The traditional approaches, such as (mini-batch) stochastic gradient descent (SGD), utilize an unbiased gradient estimator of the empirical average loss.
3 code implementations • 16 Jun 2019 • Alex Lamb, Vikas Verma, Kenji Kawaguchi, Alexander Matyasko, Savya Khosla, Juho Kannala, Yoshua Bengio
Adversarial robustness has become a central goal in deep learning, both in the theory and the practice.
no code implementations • 7 Apr 2019 • Kenji Kawaguchi, Jiaoyang Huang, Leslie Pack Kaelbling
Furthermore, as special cases of our general results, this article improves or complements several state-of-the-art theoretical results on deep neural networks, deep residual networks, and overparameterized deep neural networks with a unified proof technique and novel geometric insights.
4 code implementations • 9 Mar 2019 • Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Arno Solin, Yoshua Bengio, David Lopez-Paz
We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm.
no code implementations • 12 Jan 2019 • Jascha Sohl-Dickstein, Kenji Kawaguchi
Recent work has noted that all bad local minima can be removed from neural network loss landscapes, by adding a single unit with a particular parameterization.
no code implementations • 2 Jan 2019 • Kenji Kawaguchi, Leslie Pack Kaelbling
At every local minimum of any deep neural network with these added neurons, the set of parameters of the original neural network (without added neurons) is guaranteed to be a global minimum of the original neural network.
no code implementations • 20 Nov 2018 • Kenji Kawaguchi, Jiaoyang Huang, Leslie Pack Kaelbling
In this paper, we analyze the effects of depth and width on the quality of local minima, without strong over-parameterization and simplification assumptions in the literature.
no code implementations • 21 Oct 2018 • Kenji Kawaguchi, Yoshua Bengio
In this paper, we prove that depth with nonlinearity creates no bad local minima in a type of arbitrarily deep ResNets with arbitrary nonlinear activation functions, in the sense that the values of all local minima are no worse than the global minimum value of corresponding classical machine-learning models, and are guaranteed to further improve via residual representations.
2 code implementations • 21 Feb 2018 • Kenji Kawaguchi, Yoshua Bengio, Vikas Verma, Leslie Pack Kaelbling
This paper introduces a novel measure-theoretic theory for machine learning that does not require statistical assumptions.
no code implementations • 30 Dec 2017 • Tomaso Poggio, Kenji Kawaguchi, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Xavier Boix, Jack Hidary, Hrushikesh Mhaskar
In this note, we show that the dynamics associated to gradient descent minimization of nonlinear networks is topologically equivalent, near the asymptotically stable minima of the empirical error, to linear gradient system in a quadratic potential with a degenerate (for square loss) or almost degenerate (for logistic or crossentropy loss) Hessian.
no code implementations • 16 Oct 2017 • Kenji Kawaguchi, Leslie Pack Kaelbling, Yoshua Bengio
This paper provides theoretical insights into why and how deep learning can generalize well, despite its large capacity, complexity, possible algorithmic instability, nonrobustness, and sharp minima, responding to an open question in the literature.
1 code implementation • 28 Feb 2017 • Kenji Kawaguchi, Bo Xie, Vikas Verma, Le Song
For deep models, with no unrealistic assumptions, we prove universal approximation ability, a lower bound on approximation error, a partial optimization guarantee, and a generalization bound.
no code implementations • 27 Feb 2017 • Haihao Lu, Kenji Kawaguchi
In deep learning, \textit{depth}, as well as \textit{nonlinearity}, create non-convex loss surfaces.
no code implementations • 19 Oct 2016 • Qianli Liao, Kenji Kawaguchi, Tomaso Poggio
We systematically explored a spectrum of normalization algorithms related to Batch Normalization (BN) and propose a generalized formulation that simultaneously solves two major limitations of BN: (1) online learning and (2) recurrent learning.
no code implementations • 17 Jul 2016 • Kenji Kawaguchi, Yu Maruyama, Xiaoyu Zheng
This paper proposes a new global optimization algorithm, called Locally Oriented Global Optimization (LOGO), to aim for both fast convergence in practice and finite-time error bound in theory.
1 code implementation • NeurIPS 2016 • Kenji Kawaguchi
In this paper, we prove a conjecture published in 1989 and also partially address an open problem announced at the Conference on Learning Theory (COLT) 2015.
no code implementations • 5 Apr 2016 • Kenji Kawaguchi
Within the framework of probably approximately correct Markov decision processes (PAC-MDP), much theoretical work has focused on methods to attain near optimality after a relatively long period of learning and exploration.
no code implementations • NeurIPS 2015 • Kenji Kawaguchi, Leslie Pack Kaelbling, Tomás Lozano-Pérez
This paper presents a Bayesian optimization method with exponential convergence without the need of auxiliary optimization and without the delta-cover sampling.
no code implementations • 13 Mar 2013 • Kenji Kawaguchi, Mauricio Araya
This is a natural consequence of the fact that the proposed algorithm is greedier than the other algorithms.