no code implementations • 19 Sep 2024 • Chenyu Wang, Shuo Yan, Yixuan Chen, Yujiang Wang, Mingzhi Dong, Xiaochen Yang, Dongsheng Li, Robert P. Dick, Qin Lv, Fan Yang, Tun Lu, Ning Gu, Li Shang
Our key discovery is that coarse-grained noises in earlier denoising steps have demonstrated high motion consistency across consecutive video frames.
1 code implementation • 22 Aug 2024 • Ekdeep Singh Lubana, Kyogo Kawaguchi, Robert P. Dick, Hidenori Tanaka
We empirically investigate this definition by proposing an experimental system grounded in a context-sensitive formal language and find that Transformers trained to perform tasks on top of strings from this language indeed exhibit emergent capabilities.
no code implementations • NeurIPS 2023 • Yubin Shi, Yixuan Chen, Mingzhi Dong, Xiaochen Yang, Dongsheng Li, Yujiang Wang, Robert P. Dick, Qin Lv, Yingying Zhao, Fan Yang, Tun Lu, Ning Gu, Li Shang
To describe such modular-level learning capabilities, we introduce a novel concept dubbed modular neural tangent kernel (mNTK), and we demonstrate that the quality of a module's learning is tightly associated with its mNTK's principal eigenvalue $\lambda_{\max}$.
1 code implementation • 21 Nov 2023 • Rahul Ramesh, Ekdeep Singh Lubana, Mikail Khona, Robert P. Dick, Hidenori Tanaka
Transformers trained on huge text corpora exhibit a remarkable set of capabilities, e. g., performing basic arithmetic.
no code implementations • 21 Nov 2023 • Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Edward Grefenstette, Tim Rocktäschel, David Scott Krueger
Fine-tuning large pre-trained models has become the de facto strategy for developing both task-specific and general-purpose machine learning systems, including developing models that are safe to deploy.
1 code implementation • 26 Oct 2023 • Eric J. Bigelow, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Tomer D. Ullman
Large language models (LLMs) trained on huge corpora of text datasets demonstrate intriguing capabilities, achieving state-of-the-art performance on tasks they were not explicitly trained for.
1 code implementation • NeurIPS 2023 • Maya Okawa, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka
Motivated by this, we perform a controlled study for understanding compositional generalization in conditional diffusion models in a synthetic setting, varying different attributes of the training data and measuring the model's ability to generate samples out-of-distribution.
1 code implementation • 15 Nov 2022 • Ekdeep Singh Lubana, Eric J. Bigelow, Robert P. Dick, David Krueger, Hidenori Tanaka
We study neural network loss landscapes through the lens of mode connectivity, the observation that minimizers of neural networks retrieved via training on a dataset are connected via simple paths of low loss.
1 code implementation • 23 May 2022 • Ekdeep Singh Lubana, Chi Ian Tang, Fahim Kawsar, Robert P. Dick, Akhil Mathur
Federated learning is generally used in tasks where labels are readily available (e. g., next word prediction).
1 code implementation • 24 Jan 2022 • Yingying Zhao, Yuhu Chang, Yutian Lu, Yujiang Wang, Mingzhi Dong, Qin Lv, Robert P. Dick, Fan Yang, Tun Lu, Ning Gu, Li Shang
Experimental studies with 20 participants demonstrate that, thanks to the emotionship awareness, EMOShip not only achieves superior emotion recognition accuracy over existing methods (80. 2% vs. 69. 4%), but also provides a valuable understanding of the cause of emotions.
1 code implementation • NeurIPS 2021 • Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka
Inspired by BatchNorm, there has been an explosion of normalization layers in deep learning.
no code implementations • 3 May 2021 • Yuhu Chang, Yingying Zhao, Mingzhi Dong, Yujiang Wang, Yutian Lu, Qin Lv, Robert P. Dick, Tun Lu, Ning Gu, Li Shang
MemX captures human visual attention on the fly, analyzes the salient visual content, and records moments of personal interest in the form of compact video snippets.
no code implementations • 9 Apr 2021 • Yingying Zhao, Mingzhi Dong, Yujiang Wang, Da Feng, Qin Lv, Robert P. Dick, Dongsheng Li, Tun Lu, Ning Gu, Li Shang
By monitoring the impact of varying resolution on the quality of high-dimensional video analytics features, hence the accuracy of video analytics results, the proposed end-to-end optimization framework learns the best non-myopic policy for dynamically controlling the resolution of input video streams to globally optimize energy efficiency.
1 code implementation • 18 Feb 2021 • Zuohui Chen, Tony Zhang, Zhuangzhi Chen, Yun Xiang, Qi Xuan, Robert P. Dick
The main contribution of this paper is that to the best of our knowledge, it is the first publicly available, high temporal and spatial resolution air quality dataset containing simultaneous point sensor measurements and corresponding images.
2 code implementations • 4 Feb 2021 • Ekdeep Singh Lubana, Puja Trivedi, Danai Koutra, Robert P. Dick
Catastrophic forgetting undermines the effectiveness of deep neural networks (DNNs) in scenarios such as continual learning and lifelong learning.
1 code implementation • ICLR 2021 • Ekdeep Singh Lubana, Robert P. Dick
We use this framework to determine the relationship between pruning measures and evolution of model parameters, establishing several results related to pruning models early-on in training: (i) magnitude-based pruning removes parameters that contribute least to reduction in loss, resulting in models that converge faster than magnitude-agnostic methods; (ii) loss-preservation based pruning preserves first-order model evolution dynamics and is therefore appropriate for pruning minimally trained models; and (iii) gradient-norm based pruning affects second-order model evolution dynamics, such that increasing gradient norm via pruning can produce poorly performing models.
1 code implementation • 10 Sep 2020 • Ekdeep Singh Lubana, Puja Trivedi, Conrad Hougen, Robert P. Dick, Alfred O. Hero
To address this issue, we propose OrthoReg, a principled regularization strategy that enforces orthonormality on a network's filters to reduce inter-filter correlation, thereby allowing reliable, efficient determination of group importance estimates, improved trainability of pruned networks, and efficient, simultaneous pruning of large groups of filters.