no code implementations • 29 Oct 2024 • Chenyu Gao, Michael E. Kim, Karthik Ramadass, Praitayini Kanakaraj, Aravind R. Krishnan, Adam M. Saunders, Nancy R. Newlin, Ho Hin Lee, Qi Yang, Warren D. Taylor, Brian D. Boyd, Lori L. Beason-Held, Susan M. Resnick, Lisa L. Barnes, David A. Bennett, Katherine D. Van Schaik, Derek B. Archer, Timothy J. Hohman, Angela L. Jefferson, Ivana Išgum, Daniel Moyer, Yuankai Huo, Kurt G. Schilling, Lianrui Zuo, Shunxing Bao, Nazirah Mohd Khairi, Zhiyuan Li, Christos Davatzikos, Bennett A. Landman
We observe difference between our dMRI-based brain age and T1w MRI-based brain age across stages of neurodegeneration, with dMRI-based brain age being older than T1w MRI-based brain age in participants transitioning from cognitively normal (CN) to mild cognitive impairment (MCI), but younger in participants already diagnosed with Alzheimer's disease (AD).
no code implementations • 21 Oct 2024 • Khashayar Gatmiry, Zhiyuan Li, Sashank J. Reddi, Stefanie Jegelka
To obtain this result, our main technical contribution is to show that label noise SGD always minimizes the sharpness on the manifold of models with zero loss for two-layer networks.
1 code implementation • 10 Oct 2024 • Shuo Xie, Mohamad Amin Mohamadi, Zhiyuan Li
Yet this advantage is not well-understood theoretically -- previous convergence analysis for Adam and SGD mainly focuses on the number of steps $T$ and is already minimax-optimal in non-convex cases, which are both $\widetilde{O}(T^{-1/4})$.
no code implementations • 7 Oct 2024 • Kaiyue Wen, Zhiyuan Li, Jason Wang, David Hall, Percy Liang, Tengyu Ma
In contrast, the Warmup-Stable-Decay (WSD) schedule uses a constant learning rate to produce a main branch of iterates that can in principle continue indefinitely without a pre-specified compute budget.
1 code implementation • 3 Oct 2024 • Zhiyuan Li, Yanhui Zhou, Hao Wei, Chenyang Ge, Ajmal Mian
Second, we design a novel relay residual diffusion that reconstructs the raw image by iteratively removing the added noise and the residual between the compressed and target latent features.
no code implementations • 21 Sep 2024 • Zhiyuan Li, Dongnan Liu, Chaoyi Zhang, Heng Wang, Tengfei Xue, Weidong Cai
Recent advancements in Vision-Language (VL) research have sparked new benchmarks for complex visual reasoning, challenging models' advanced reasoning ability.
no code implementations • 20 Sep 2024 • Zhiyuan Li, Tianyuan Yao, Praitayini Kanakaraj, Chenyu Gao, Shunxing Bao, Lianrui Zuo, Michael E. Kim, Nancy R. Newlin, Gaurav Rudravaram, Nazirah M. Khairi, Yuankai Huo, Kurt G. Schilling, Walter A. Kukull, Arthur W. Toga, Derek B. Archer, Timothy J. Hohman, Bennett A. Landman
We hypothesize that by this design the proposed framework can enhance the imputation performance of the dMRI scans and therefore be useful for repairing whole-brain tractography in corrupted dMRI scans with incomplete FOV.
no code implementations • 4 Sep 2024 • Zhiyuan Li, YanFeng Lu, Yao Mu, Hong Qiao
Firstly, it constructs a cognitive map, integrating temporal, spatial, and semantic elements, thereby facilitating the development of spatial memory within LLMs.
no code implementations • 4 Sep 2024 • Zhiyuan Li, Yanfeng Lv, Ziqin Tu, Di Shang, Hong Qiao
Comparative experiments with existing continual learning and VLN methods show significant improvements, achieving state-of-the-art performance in continual learning ability and highlighting the potential of our approach in enabling rapid adaptation while preserving prior knowledge.
no code implementations • 28 Aug 2024 • Wei Chen, Zhiyuan Li, Shuo Xin, Yihao Wang
Our work contributes to the development of more sustainable and scalable language models for on-device applications, addressing the critical need for energy-efficient and responsive AI technologies in resource-constrained environments while maintaining the accuracy to understand long contexts.
1 code implementation • 26 Aug 2024 • Jiajun Xu, Zhiyuan Li, Wei Chen, Qun Wang, Xin Gao, Qi Cai, Ziyuan Ling
For a comprehensive review of research work and educational resources on on-device large language models (LLMs), please visit https://github. com/NexaAI/Awesome-LLMs-on-device.
1 code implementation • 15 Aug 2024 • Zhiyuan Li, Heng Wang, Dongnan Liu, Chaoyi Zhang, Ao Ma, Jieting Long, Weidong Cai
However, will these causalities remain straightforward for Vision Large Language Models (VLLMs) when only visual hints are provided?
no code implementations • 17 Jul 2024 • Mohamad Amin Mohamadi, Zhiyuan Li, Lei Wu, Danica J. Sutherland
We present a theoretical explanation of the ``grokking'' phenomenon, where a model generalizes long after overfitting, for the originally-studied problem of modular addition.
no code implementations • 26 Jun 2024 • Wei Chen, Zhiyuan Li, Zhen Guo, Yikang Shen
In this paper, we present an efficient on-device Planner-Action framework that separates planning and action execution into two distinct components: a planner agent based on Phi-3 Mini, a 3. 8 billion parameter LLM optimized for edge devices, and an action agent using the Octopus model for function execution.
no code implementations • 9 May 2024 • Yicheng Yang, Xinyu Wang, Haoming Yu, Zhiyuan Li
Our experiments show that the difficulty level of questions generated by our AQG approach is similar to the questions presented to students in the textbook [1].
no code implementations • 6 May 2024 • Chenyu Gao, Shunxing Bao, Michael Kim, Nancy Newlin, Praitayini Kanakaraj, Tianyuan Yao, Gaurav Rudravaram, Yuankai Huo, Daniel Moyer, Kurt Schilling, Walter Kukull, Arthur Toga, Derek Archer, Timothy Hohman, Bennett Landman, Zhiyuan Li
We hypothesize that the imputed image with complete FOV can improve the whole-brain tractography for corrupted data with incomplete FOV.
no code implementations • 30 Apr 2024 • Wei Chen, Zhiyuan Li
Additionally, we explore the use of graph as a versatile data structure that effectively coordinates multiple open-source models by harnessing the capabilities of the Octopus model and \textit{functional tokens}.
1 code implementation • 29 Apr 2024 • Zhiyuan Li, Yanhui Zhou, Hao Wei, Chenyang Ge, Jingwen Jiang
In this work, we propose a novel two-stage extreme image compression framework that exploits the powerful generative capability of pre-trained diffusion models to achieve realistic image reconstruction at extremely low bitrates.
2 code implementations • 25 Apr 2024 • Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu, Chengjian Zheng, Diankai Zhang, Ning Wang, Xintao Qiu, Yuanbo Zhou, Kongxian Wu, Xinwei Dai, Hui Tang, Wei Deng, Qingquan Gao, Tong Tong, Jae-Hyeon Lee, Ui-Jin Choi, Min Yan, Xin Liu, Qian Wang, Xiaoqian Ye, Zhan Du, Tiansen Zhang, Long Peng, Jiaming Guo, Xin Di, Bohao Liao, Zhibo Du, Peize Xia, Renjing Pei, Yang Wang, Yang Cao, ZhengJun Zha, Bingnan Han, Hongyuan Yu, Zhuoyuan Wu, Cheng Wan, Yuqing Liu, Haodong Yu, Jizhe Li, Zhijuan Huang, Yuan Huang, Yajun Zou, Xianyu Guan, Qi Jia, Heng Zhang, Xuanwu Yin, Kunlong Zuo, Hyeon-Cheol Moon, Tae-hyun Jeong, Yoonmo Yang, Jae-Gon Kim, Jinwoo Jeong, Sunjei Kim
This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs.
no code implementations • 17 Apr 2024 • Wei Chen, Zhiyuan Li
A multimodal AI agent is characterized by its ability to process and learn from various types of data, including natural language, visual, and audio inputs, to inform its actions.
no code implementations • 5 Apr 2024 • Shuo Xie, Zhiyuan Li
Adam with decoupled weight decay, also known as AdamW, is widely acclaimed for its superior performance in language modeling tasks, surpassing Adam with $\ell_2$ regularization in terms of generalization and optimization.
no code implementations • 2 Apr 2024 • Wei Chen, Zhiyuan Li
Current on-device models for function calling face issues with latency and accuracy.
no code implementations • 2 Apr 2024 • Wei Chen, Zhiyuan Li, Mingyuan Ma
In the rapidly evolving domain of artificial intelligence, Large Language Models (LLMs) play a crucial role due to their advanced text processing and generation abilities.
no code implementations • 24 Feb 2024 • Zhiyuan Li, Chenyang Ge, Shun Li
Recently, many deep image compression methods have been proposed and achieved remarkable performance.
no code implementations • 20 Feb 2024 • Zhiyuan Li, Hong Liu, Denny Zhou, Tengyu Ma
Given input length $n$, previous works have shown that constant-depth transformers with finite precision $\mathsf{poly}(n)$ embedding size can only solve problems in $\mathsf{TC}^0$ without CoT.
no code implementations • 16 Jan 2024 • Zhiyuan Li, Wenshuai Zhao, Lijun Wu, Joni Pajarinen
Inspired by the concept of correlated equilibrium, we propose to introduce a \textit{strategy modification} to provide a mechanism for agents to correlate their policies.
no code implementations • 22 Dec 2023 • Zhiyuan Li, Hailong Li, Anca L. Ralescu, Jonathan R. Dillman, Mekibib Altaye, Kim M. Cecil, Nehal A. Parikh, Lili He
The integration of different imaging modalities, such as structural, diffusion tensor, and functional magnetic resonance imaging, with deep learning models has yielded promising outcomes in discerning phenotypic characteristics and enhancing disease diagnosis.
1 code implementation • 30 Nov 2023 • Kaifeng Lyu, Jikai Jin, Zhiyuan Li, Simon S. Du, Jason D. Lee, Wei Hu
Recent work by Power et al. (2022) highlighted a surprising "grokking" phenomenon in learning arithmetic tasks: a neural net first "memorizes" the training set, resulting in perfect training accuracy but near-random test accuracy, and after training for sufficiently longer, it suddenly transitions to perfect test accuracy.
1 code implementation • 9 Nov 2023 • Yida Yin, Zhiqiu Xu, Zhiyuan Li, Trevor Darrell, Zhuang Liu
Stochastic Variance Reduced Gradient (SVRG), introduced by Johnson & Zhang (2013), is a theoretically compelling optimization method.
1 code implementation • 4 Nov 2023 • Tiancheng Gu, Dongnan Liu, Zhiyuan Li, Weidong Cai
The goal of automatic report generation is to generate a clinically accurate and coherent phrase from a single given X-ray image, which could alleviate the workload of traditional radiology reporting.
1 code implementation • 3 Nov 2023 • Wenshuai Zhao, Yi Zhao, Zhiyuan Li, Juho Kannala, Joni Pajarinen
*Relative overgeneralization* (RO) occurs in cooperative multi-agent learning tasks when agents converge towards a suboptimal joint policy due to overfitting to suboptimal behavior of other agents.
no code implementations • 12 Aug 2023 • Nilesh Kumar, Ruby Shrestha, Zhiyuan Li, Linwei Wang
Spurious correlation caused by subgroup underrepresentation has received increasing attention as a source of bias that can be perpetuated by deep neural networks (DNNs).
no code implementations • 27 Jul 2023 • Runzhe Wang, Sadhika Malladi, Tianhao Wang, Kaifeng Lyu, Zhiyuan Li
Momentum is known to accelerate the convergence of gradient descent in strongly convex settings without stochastic gradient noise.
1 code implementation • 27 Jul 2023 • Zhiyuan Li, Dongnan Liu, Heng Wang, Chaoyi Zhang, Weidong Cai
Recently, training an image captioner without annotated image-sentence pairs has gained traction.
1 code implementation • 22 Jul 2023 • Yafei Zhang, Zhiyuan Li, Huafeng Li, Dapeng Tao
To this end, a multi-modal MR brain tumor segmentation method with tumor prototype-driven and multi-expert integration is proposed.
no code implementations • 22 Jun 2023 • Khashayar Gatmiry, Zhiyuan Li, Ching-Yao Chuang, Sashank Reddi, Tengyu Ma, Stefanie Jegelka
Recent works on over-parameterized neural networks have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its Hessian) over the family zero-loss solutions.
6 code implementations • 23 May 2023 • Hong Liu, Zhiyuan Li, David Hall, Percy Liang, Tengyu Ma
Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training.
1 code implementation • ICLR 2023 • Xiajun Jiang, Ryan Missel, Zhiyuan Li, Linwei Wang
We compared the presented framework with a comprehensive set of baseline models trained 1) globally on the large meta-training set with diverse dynamics, and 2) individually on single dynamics, both with and without fine-tuning to k-shot support series used by the meta-models.
no code implementations • 16 Apr 2023 • Zhiyuan Li, Ziru Liu, Anna Zou, Anca L. Ralescu
Deep metric learning techniques have been used for visual representation in various supervised and unsupervised learning tasks through learning embeddings of samples with deep networks.
1 code implementation • 20 Feb 2023 • Zhiyuan Li, Hailong Li, Anca L. Ralescu, Jonathan R. Dillman, Nehal A. Parikh, Lili He
We compared our proposed method with other state-of-the-art self-supervised learning methods on a simulation study and two independent datasets.
no code implementations • 31 Jan 2023 • Zhiyuan Li, Anca Ralescu
Recently, deep metric learning techniques received attention, as the learned distance representations are useful to capture the similarity relationship among samples and further improve the performance of various of supervised or unsupervised learning tasks.
no code implementations • 27 Jan 2023 • Jikai Jin, Zhiyuan Li, Kaifeng Lyu, Simon S. Du, Jason D. Lee
It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in training machine learning models.
no code implementations • 10 Nov 2022 • Kaiyue Wen, Tengyu Ma, Zhiyuan Li
SAM intends to penalize a notion of sharpness of the model but implements a computationally efficient variant; moreover, a third notion of sharpness was used for proving generalization guarantees.
no code implementations • 2 Nov 2022 • Maryam Toloubidokhti, Nilesh Kumar, Zhiyuan Li, Prashnna K. Gyawali, Brian Zenger, Wilson W. Good, Rob S. MacLeod, Linwei Wang
Prior knowledge about the imaging physics provides a mechanistic forward operator that plays an important role in image reconstruction, although myriad sources of possible errors in the operator could negatively impact the reconstruction solutions.
no code implementations • 25 Oct 2022 • Hong Liu, Sang Michael Xie, Zhiyuan Li, Tengyu Ma
Toward understanding this implicit bias, we prove that SGD with standard mini-batch noise implicitly prefers flatter minima in language models, and empirically observe a strong correlation between flatness and downstream performance among models with the same minimal pre-training loss.
1 code implementation • 6 Oct 2022 • Xiajun Jiang, Zhiyuan Li, Ryan Missel, Md Shakil Zaman, Brian Zenger, Wilson W. Good, Rob S. MacLeod, John L. Sapp, Linwei Wang
As test time, metaPNS delivers a personalized neural surrogate by fast feed-forward embedding of a small and flexible number of data available from an individual, achieving -- for the first time -- personalization and surrogate construction for expensive simulations in one end-to-end learning framework.
no code implementations • 8 Jul 2022 • Zhiyuan Li, Tianhao Wang, JasonD. Lee, Sanjeev Arora
Conversely, continuous mirror descent with any Legendre function can be viewed as gradient flow with a related commuting parametrization.
no code implementations • 14 Jun 2022 • Kaifeng Lyu, Zhiyuan Li, Sanjeev Arora
Normalization layers (e. g., Batch Normalization, Layer Normalization) were introduced to help with optimization difficulties in very deep nets, but they clearly also help generalization, even in not-so-deep nets.
no code implementations • 20 May 2022 • Wenshuai Zhao, Zhiyuan Li, Joni Pajarinen
Inspired by the success of CRL in single-agent settings, a few works have attempted to apply CRL to multi-agent reinforcement learning (MARL) using the number of agents to control task difficulty.
Multi-agent Reinforcement Learning Open-Ended Question Answering +4
no code implementations • 19 May 2022 • Sanjeev Arora, Zhiyuan Li, Abhishek Panigrahi
The current paper mathematically analyzes a new mechanism of implicit regularization in the EoS phase, whereby GD updates due to non-smooth loss landscape turn out to evolve along some deterministic flow on the manifold of minimum loss.
no code implementations • 24 Mar 2022 • Anna Zou, Zhiyuan Li
Deep learning continues to play as a powerful state-of-art technique that has achieved extraordinary accuracy levels in various domains of regression and classification tasks, including images, video, signal, and natural language data.
no code implementations • 8 Feb 2022 • Zhiyuan Li, Hailong Li, Adebayo Braimah, Jonathan R. Dillman, Nehal A. Parikh, Lili He
We applied the OAP-EL to predict cognitive deficits at 2 years of age using quantitative brain maturation and geometric features obtained at term equivalent age in very preterm infants.
no code implementations • 2 Feb 2022 • Zhiyuan Li, Srinadh Bhojanapalli, Manzil Zaheer, Sashank J. Reddi, Sanjiv Kumar
In contrast to SGD, adaptive gradient methods like Adam allow robust training of modern deep networks, especially large language models.
no code implementations • NeurIPS 2021 • Kaifeng Lyu, Zhiyuan Li, Runzhe Wang, Sanjeev Arora
The current paper is able to establish this global optimality for two-layer Leaky ReLU nets trained with gradient flow on linearly separable and symmetric data, regardless of the width.
no code implementations • ICLR 2022 • Zhiyuan Li, Tianhao Wang, Sanjeev Arora
Understanding the implicit bias of Stochastic Gradient Descent (SGD) is one of the key challenges in deep learning, especially for overparametrized models, where the local minimizers of the loss function $L$ can form a manifold.
1 code implementation • ICCV 2021 • Limeng Qiao, Yuxuan Zhao, Zhiyuan Li, Xi Qiu, Jianan Wu, Chi Zhang
Few-shot object detection, which aims at detecting novel objects rapidly from extremely few annotated examples of previously unseen classes, has attracted significant research interest in the community.
Ranked #5 on Few-Shot Object Detection on MS-COCO (1-shot)
no code implementations • 25 Mar 2021 • Yaqi Duan, Chi Jin, Zhiyuan Li
Concretely, we view the Bellman error as a surrogate loss for the optimality gap, and prove the followings: (1) In double sampling regime, the excess risk of Empirical Risk Minimizer (ERM) is bounded by the Rademacher complexity of the function class.
no code implementations • 24 Mar 2021 • Zhiyuan Li
To solve the challenge, we propose a two-stage approach that combines a variable clustering stage and a group variable stage for the group variable selection problem.
1 code implementation • NeurIPS 2021 • Zhiyuan Li, Sadhika Malladi, Sanjeev Arora
It is generally recognized that finite learning rate (LR), in contrast to infinitesimal LR, is important for good generalization in real-life deep nets.
no code implementations • 26 Jan 2021 • Mengfei Zhang, Zhiyuan Li, Mark R. Morris
Our simulations are run with different combinations of two main parameters, the supernova birth rate and the strength of a global magnetic field being vertically oriented with respect to the disk.
Astrophysics of Galaxies
no code implementations • ICLR 2021 • Zhiyuan Li, Yuping Luo, Kaifeng Lyu
Matrix factorization is a simple and natural test-bed to investigate the implicit regularization of gradient descent.
no code implementations • ICLR 2021 • Zhiyuan Li, Yi Zhang, Sanjeev Arora
However, this has not been made mathematically rigorous, and the hurdle is that the fully connected net can always simulate the convolutional net (for a fixed task).
no code implementations • NeurIPS 2020 • Zhiyuan Li, Kaifeng Lyu, Sanjeev Arora
Recent works (e. g., (Li and Arora, 2020)) suggest that the use of popular normalization schemes (including Batch Normalization) in today's deep learning can move it far from a traditional optimization viewpoint, e. g., use of exponentially increasing learning rates.
no code implementations • 18 Jul 2020 • Xiajun Jiang, Sandesh Ghimire, Jwala Dhamala, Zhiyuan Li, Prashnna Kumar Gyawali, Linwei Wang
However, many reconstruction problems involve imaging physics that are dependent on the underlying non-Euclidean geometry.
no code implementations • 26 Jun 2020 • Ping Zhou, Shing-Chi Leung, Zhiyuan Li, Ken'ichi Nomoto, Jacco Vink, Yang Chen
We report evidence that SNR Sgr A East in the Galactic center resulted from a pure turbulent deflagration of a Chandrasekhar-mass carbon-oxygen WD, an explosion mechanism used for type Iax SNe.
High Energy Astrophysical Phenomena
no code implementations • 10 Jun 2020 • Simon S. Du, Wei Hu, Zhiyuan Li, Ruoqi Shen, Zhao Song, Jiajun Wu
Though errors in past actions may affect the future, we are able to bound the number of particles needed so that the long-run reward of the policy based on particle filtering is close to that based on exact inference.
1 code implementation • 22 May 2020 • Prashnna Kumar Gyawali, Sandesh Ghimire, Pradeep Bajracharya, Zhiyuan Li, Linwei Wang
In this work, we argue that regularizing the global smoothness of neural functions by filling the void in between data points can further improve SSL.
1 code implementation • ICLR 2020 • Zhiyuan Li, Jaideep Vitthal Murkute, Prashnna Kumar Gyawali, Linwei Wang
By drawing on the respective advantage of hierarchical representation learning and progressive learning, this is to our knowledge the first attempt to improve disentanglement by progressively growing the capacity of VAE to learn hierarchical representations.
no code implementations • NeurIPS 2020 • Xiaoxia Wu, Edgar Dobriban, Tongzheng Ren, Shanshan Wu, Zhiyuan Li, Suriya Gunasekar, Rachel Ward, Qiang Liu
For certain stepsizes of g and w , we show that they can converge close to the minimum norm solution.
no code implementations • 3 Nov 2019 • Zhiyuan Li, Ruosong Wang, Dingli Yu, Simon S. Du, Wei Hu, Ruslan Salakhutdinov, Sanjeev Arora
An exact algorithm to compute CNTK (Arora et al., 2019) yielded the finding that classification accuracy of CNTK on CIFAR-10 is within 6-7% of that of that of the corresponding CNN architecture (best figure being around 78%) which is interesting performance for a fixed kernel.
no code implementations • ICLR 2020 • Zhiyuan Li, Sanjeev Arora
This paper suggests that the phenomenon may be due to Batch Normalization or BN, which is ubiquitous and provides benefits in optimization and generalization across all standard architectures.
4 code implementations • ICLR 2020 • Sanjeev Arora, Simon S. Du, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu
On VOC07 testbed for few-shot image classification tasks on ImageNet with transfer learning (Goyal et al., 2019), replacing the linear SVM currently used with a Convolutional NTK SVM consistently improves performance.
1 code implementation • 3 Sep 2019 • Prashnna Kumar Gyawali, Zhiyuan Li, Cameron Knight, Sandesh Ghimire, B. Milan Horacek, John Sapp, Linwei Wang
We note that the independence within and the complexity of the latent density are two different properties we constrain when regularizing the posterior density: while the former promotes the disentangling ability of VAE, the latter -- if overly limited -- creates an unnecessary competition with the data reconstruction objective in VAE.
1 code implementation • 22 Jul 2019 • Prashnna Kumar Gyawali, Zhiyuan Li, Sandesh Ghimire, Linwei Wang
In this work, we hypothesize -- from the generalization perspective -- that self-ensembling can be improved by exploiting the stochasticity of a disentangled latent space.
1 code implementation • NeurIPS 2019 • Rohith Kuditipudi, Xiang Wang, Holden Lee, Yi Zhang, Zhiyuan Li, Wei Hu, Sanjeev Arora, Rong Ge
Mode connectivity is a surprising phenomenon in the loss landscape of deep nets.
no code implementations • 6 Jun 2019 • Jie Cai, Zibo Meng, Ahmed Shehab Khan, Zhiyuan Li, James O'Reilly, Shizhong Han, Ping Liu, Min Chen, Yan Tong
In this paper, we proposed two strategies to fuse information extracted from different modalities, i. e., audio and visual.
no code implementations • ICLR 2020 • Wei Hu, Zhiyuan Li, Dingli Yu
Over-parameterized deep neural networks trained by simple first-order methods are known to be able to fit any labeling of data.
1 code implementation • ICLR 2019 • Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann Lecun, Nathan Srebro
Despite existing work on ensuring generalization of neural networks in terms of scale sensitive complexity measures, such as norms, margin and sharpness, these complexity measures do not offer an explanation of why neural networks generalize better with over-parametrization.
2 code implementations • NeurIPS 2019 • Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang
An attraction of such ideas is that a pure kernel-based method is used to capture the power of a fully-trained deep net of infinite width.
no code implementations • 19 Mar 2019 • Jie Cai, Zibo Meng, Ahmed Shehab Khan, Zhiyuan Li, James O'Reilly, Shizhong Han, Yan Tong
A novel Identity-Free conditional Generative Adversarial Network (IF-GAN) was proposed for Facial Expression Recognition (FER) to explicitly reduce high inter-subject variations caused by identity-related facial attributes, e. g., age, race, and gender.
Facial Expression Recognition Facial Expression Recognition (FER) +1
no code implementations • 24 Jan 2019 • Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruosong Wang
This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR'17].
no code implementations • 17 Dec 2018 • Jie Cai, Zibo Meng, Ahmed Shehab Khan, Zhiyuan Li, James O'Reilly, Yan Tong
In this paper, we proposed a novel Probabilistic Attribute Tree-CNN (PAT-CNN) to explicitly deal with the large intra-class variations caused by identity-related attributes, e. g., age, race, and gender.
no code implementations • ICLR 2019 • Sanjeev Arora, Zhiyuan Li, Kaifeng Lyu
Batch Normalization (BN) has become a cornerstone of deep learning across diverse architectures, appearing to help optimization as well as generalization.
no code implementations • 15 Nov 2018 • Zhiyuan Li, Min Jin, Qi Wu, Huaxiang Lu
Just like its remarkable achievements in many computer vision tasks, the convolutional neural networks (CNN) provide an end-to-end solution in handwritten Chinese character recognition (HCCR) with great success.
Binary Classification Offline Handwritten Chinese Character Recognition +1
2 code implementations • 30 May 2018 • Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann Lecun, Nathan Srebro
Despite existing work on ensuring generalization of neural networks in terms of scale sensitive complexity measures, such as norms, margin and sharpness, these complexity measures do not offer an explanation of why neural networks generalize better with over-parametrization.
no code implementations • NeurIPS 2018 • Elad Hazan, Wei Hu, Yuanzhi Li, Zhiyuan Li
We revisit the question of reducing online learning to approximate optimization of the offline problem.
no code implementations • 4 Apr 2018 • Zhiyuan Li, Nanjun Teng, Min Jin, Huaxiang Lu
Deep convolutional networks based methods have brought great breakthrough in images classification, which provides an end-to-end solution for handwritten Chinese character recognition(HCCR) problem through learning discriminative features automatically.
1 code implementation • 9 Oct 2017 • Jie Cai, Zibo Meng, Ahmed Shehab Khan, Zhiyuan Li, James O'Reilly, Yan Tong
Over the past few years, Convolutional Neural Networks (CNNs) have shown promise on facial expression recognition.
Ranked #4 on Facial Expression Recognition (FER) on SFEW
Facial Expression Recognition Facial Expression Recognition (FER)
no code implementations • CVPR 2018 • Shizhong Han, Zibo Meng, Zhiyuan Li, James O'Reilly, Jie Cai, Xiao-Feng Wang, Yan Tong
Most recently, Convolutional Neural Networks (CNNs) have shown promise for facial AU recognition, where predefined and fixed convolution filter sizes are employed.
no code implementations • NeurIPS 2016 • Yexiang Xue, Zhiyuan Li, Stefano Ermon, Carla P. Gomes, Bart Selman
Arising from many applications at the intersection of decision making and machine learning, Marginal Maximum A Posteriori (Marginal MAP) Problems unify the two main classes of inference, namely maximization (optimization) and marginal inference (counting), and are believed to have higher complexity than both of them.
no code implementations • NeurIPS 2016 • Dylan J. Foster, Zhiyuan Li, Thodoris Lykouris, Karthik Sridharan, Eva Tardos
We show that learning algorithms satisfying a $\textit{low approximate regret}$ property experience fast convergence to approximate optimality in a large class of repeated games.