no code implementations • 16 Oct 2024 • Arham Khan, Todd Nief, Nathaniel Hudson, Mansi Sakarvadia, Daniel Grzenda, Aswathy Ajith, Jordan Pettyjohn, Kyle Chard, Ian Foster
We analyze the model merging literature through the lens of loss landscape geometry, an approach that enables us to connect observations from empirical studies on interpretability, security, model merging, and loss landscape analysis to phenomena that govern neural network training and the emergence of their inner representations.
1 code implementation • 3 Oct 2024 • Mansi Sakarvadia, Aswathy Ajith, Arham Khan, Nathaniel Hudson, Caleb Geniesse, Kyle Chard, Yaoqing Yang, Ian Foster, Michael W. Mahoney
Language models (LMs) can "memorize" information, i. e., encode training data in their weights in such a way that inference-time queries can lead to verbatim regurgitation of that data.
no code implementations • 24 Sep 2024 • Nathaniel Hudson, Valerie Hayot-Sasson, Yadu Babuji, Matt Baughman, J. Gregory Pauloski, Ryan Chard, Ian Foster, Kyle Chard
Federated Learning (FL) is a decentralized machine learning paradigm where models are trained on distributed devices and are aggregated at a central server.
no code implementations • 26 Aug 2024 • Logan Ward, J. Gregory Pauloski, Valerie Hayot-Sasson, Yadu Babuji, Alexander Brace, Ryan Chard, Kyle Chard, Rajeev Thakur, Ian Foster
Computational workflows are a common class of application on supercomputers, yet the loosely coupled and heterogeneous nature of workflows often fails to take full advantage of their capabilities.
no code implementations • 21 Feb 2024 • Zhi Hong, Kyle Chard, Ian Foster
Relation extraction is an efficient way of mining the extraordinary wealth of human knowledge on the Web.
no code implementations • 5 Feb 2024 • Nathaniel Hudson, J. Gregory Pauloski, Matt Baughman, Alok Kamatar, Mansi Sakarvadia, Logan Ward, Ryan Chard, André Bauer, Maksim Levental, Wenyi Wang, Will Engler, Owen Price Skelly, Ben Blaiszik, Rick Stevens, Kyle Chard, Ian Foster
Deep learning methods are transforming research, enabling new techniques, and ultimately leading to new discoveries.
no code implementations • 4 Jan 2024 • André Bauer, Simon Trapp, Michael Stenger, Robert Leppich, Samuel Kounev, Mark Leznik, Kyle Chard, Ian Foster
This work surveys 417 Synthetic Data Generation (SDG) models over the last decade, providing a comprehensive overview of model types, functionality, and improvements.
1 code implementation • 25 Oct 2023 • Mansi Sakarvadia, Arham Khan, Aswathy Ajith, Daniel Grzenda, Nathaniel Hudson, André Bauer, Kyle Chard, Ian Foster
Transformer-based Large Language Models (LLMs) are the state-of-the-art for natural language tasks.
1 code implementation • 11 Sep 2023 • Mansi Sakarvadia, Aswathy Ajith, Arham Khan, Daniel Grzenda, Nathaniel Hudson, André Bauer, Kyle Chard, Ian Foster
Answering multi-hop reasoning questions requires retrieving and synthesizing information from diverse sources.
no code implementations • 28 Aug 2023 • Samir Rajani, Dario Dematties, Nathaniel Hudson, Kyle Chard, Nicola Ferrier, Rajesh Sankaran, Peter Beckman
Despite this, recent works have demonstrated that data reconstruction can be done with the locally trained model updates which are communicated across the network.
no code implementations • 28 Apr 2023 • Omer Rana, Theodoros Spyridopoulos, Nathaniel Hudson, Matt Baughman, Kyle Chard, Ian Foster, Aftab Khan
Hierarchical Federated Learning is likely to be a key enabler for a wide range of applications, such as smart farming and smart energy management, as it can improve performance and reduce costs, whilst also enabling FL workflows to be deployed in environments that are not well-suited to traditional FL.
2 code implementations • 15 Mar 2023 • Logan Ward, J. Gregory Pauloski, Valerie Hayot-Sasson, Ryan Chard, Yadu Babuji, Ganesh Sivaraman, Sutanay Choudhury, Kyle Chard, Rajeev Thakur, Ian Foster
Applications that fuse machine learning and simulation can benefit from the use of multiple computing resources, with, for example, simulation codes running on highly parallel supercomputers and AI training and inference tasks on specialized accelerators.
no code implementations • 13 Feb 2023 • Maksim Levental, Arham Khan, Ryan Chard, Kazutomo Yoshii, Kyle Chard, Ian Foster
In many experiment-driven scientific domains, such as high-energy physics, material science, and cosmology, high data rate experiments impose hard constraints on data acquisition systems: collected data must either be indiscriminately stored for post-processing and analysis, thereby necessitating large storage capacity, or accurately filtered in real-time, thereby necessitating low-latency processing.
no code implementations • 19 Aug 2022 • Ryan Chard, Jim Pruyne, Kurt McKee, Josh Bryan, Brigitte Raumann, Rachana Ananthakrishnan, Kyle Chard, Ian Foster
We report here on new services within the Globus research data management platform that enable the specification of diverse research processes as reusable sets of actions, \emph{flows}, and the execution of such flows in heterogeneous research environments.
1 code implementation • 1 Jul 2022 • Nikil Ravi, Pranshu Chaturvedi, E. A. Huerta, Zhengchun Liu, Ryan Chard, Aristana Scourtas, K. J. Schmidt, Kyle Chard, Ben Blaiszik, Ian Foster
A concise and measurable set of FAIR (Findable, Accessible, Interoperable and Reusable) principles for scientific data is transforming the state-of-practice for data management and stewardship, supporting and enabling discovery and innovation.
no code implementations • 23 May 2022 • Zhi Hong, Aswathy Ajith, Gregory Pauloski, Eamon Duede, Kyle Chard, Ian Foster
Transformer-based masked language models such as BERT, trained on general corpora, have shown impressive performance on downstream tasks.
1 code implementation • 6 Oct 2021 • Logan Ward, Ganesh Sivaraman, J. Gregory Pauloski, Yadu Babuji, Ryan Chard, Naveen Dandu, Paul C. Redfern, Rajeev S. Assary, Kyle Chard, Larry A. Curtiss, Rajeev Thakur, Ian Foster
Scientific applications that involve simulation ensembles can be accelerated greatly by using experiment design methods to select the best simulations to perform.
no code implementations • 26 Aug 2021 • Maksim Levental, Ryan Chard, Kyle Chard, Ian Foster, Gregg A. Wildenberg
Technological advancements in modern scientific instruments, such as scanning electron microscopes (SEMs), have significantly increased data acquisition rates and image resolutions enabling new questions to be explored; however, the resulting data volumes and velocities, combined with automated experiments, are quickly overwhelming scientists as there remain crucial steps that require human intervention, for example reviewing image focus.
3 code implementations • 4 Jul 2021 • J. Gregory Pauloski, Qi Huang, Lei Huang, Shivaram Venkataraman, Kyle Chard, Ian Foster, Zhao Zhang
Kronecker-factored Approximate Curvature (K-FAC) has recently been shown to converge faster in deep neural network (DNN) training than stochastic gradient descent (SGD); however, K-FAC's larger memory footprint hinders its applicability to large models.
1 code implementation • 12 Jan 2021 • Zhi Hong, J. Gregory Pauloski, Logan Ward, Kyle Chard, Ben Blaiszik, Ian Foster
Researchers worldwide are seeking to repurpose existing drugs or discover new drugs to counter the disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
1 code implementation • 16 Oct 2020 • Maksim Levental, Ryan Chard, Joseph A. Libera, Kyle Chard, Aarthi Koripelly, Jakob R. Elias, Marcus Schwarting, Ben Blaiszik, Marius Stan, Santanu Chaudhuri, Ian Foster
Flame Spray Pyrolysis (FSP) is a manufacturing technique to mass produce engineered nanoparticles for applications in catalysis, energy materials, composites, and more.
1 code implementation • 28 May 2020 • Yadu Babuji, Ben Blaiszik, Tom Brettin, Kyle Chard, Ryan Chard, Austin Clyde, Ian Foster, Zhi Hong, Shantenu Jha, Zhuozhao Li, Xuefeng Liu, Arvind Ramanathan, Yi Ren, Nicholaus Saint, Marcus Schwarting, Rick Stevens, Hubertus van Dam, Rick Wagner
Researchers across the globe are seeking to rapidly repurpose existing drugs or discover new drugs to counter the the novel coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
no code implementations • 7 May 2020 • Ryan Chard, Yadu Babuji, Zhuozhao Li, Tyler Skluzacek, Anna Woodard, Ben Blaiszik, Ian Foster, Kyle Chard
These new approaches must enable computation to be mobile, so that, for example, it can occur near data, be triggered by events (e. g., arrival of new data), be offloaded to specialized accelerators, or run remotely where resources are available.
Distributed, Parallel, and Cluster Computing
no code implementations • 26 Nov 2019 • E. A. Huerta, Gabrielle Allen, Igor Andreoni, Javier M. Antelis, Etienne Bachelet, Bruce Berriman, Federica Bianco, Rahul Biswas, Matias Carrasco, Kyle Chard, Minsik Cho, Philip S. Cowperthwaite, Zachariah B. Etienne, Maya Fishbach, Francisco Förster, Daniel George, Tom Gibbs, Matthew Graham, William Gropp, Robert Gruendl, Anushri Gupta, Roland Haas, Sarah Habib, Elise Jennings, Margaret W. G. Johnson, Erik Katsavounidis, Daniel S. Katz, Asad Khan, Volodymyr Kindratenko, William T. C. Kramer, Xin Liu, Ashish Mahabal, Zsuzsa Marka, Kenton McHenry, Jonah Miller, Claudia Moreno, Mark Neubauer, Steve Oberlin, Alexander R. Olivas, Donald Petravick, Adam Rebei, Shawn Rosofsky, Milton Ruiz, Aaron Saxton, Bernard F. Schutz, Alex Schwing, Ed Seidel, Stuart L. Shapiro, Hongyu Shen, Yue Shen, Leo Singer, Brigitta M. Sipőcz, Lunan Sun, John Towns, Antonios Tsokaros, Wei Wei, Jack Wells, Timothy J. Williams, JinJun Xiong, Zhizhen Zhao
Multi-messenger astrophysics is a fast-growing, interdisciplinary field that combines data, which vary in volume and speed of data processing, from many different instruments that probe the Universe using different cosmic messengers: electromagnetic waves, cosmic rays, gravitational waves and neutrinos.
no code implementations • 1 Feb 2019 • Gabrielle Allen, Igor Andreoni, Etienne Bachelet, G. Bruce Berriman, Federica B. Bianco, Rahul Biswas, Matias Carrasco Kind, Kyle Chard, Minsik Cho, Philip S. Cowperthwaite, Zachariah B. Etienne, Daniel George, Tom Gibbs, Matthew Graham, William Gropp, Anushri Gupta, Roland Haas, E. A. Huerta, Elise Jennings, Daniel S. Katz, Asad Khan, Volodymyr Kindratenko, William T. C. Kramer, Xin Liu, Ashish Mahabal, Kenton McHenry, J. M. Miller, M. S. Neubauer, Steve Oberlin, Alexander R. Olivas Jr, Shawn Rosofsky, Milton Ruiz, Aaron Saxton, Bernard Schutz, Alex Schwing, Ed Seidel, Stuart L. Shapiro, Hongyu Shen, Yue Shen, Brigitta M. Sipőcz, Lunan Sun, John Towns, Antonios Tsokaros, Wei Wei, Jack Wells, Timothy J. Williams, JinJun Xiong, Zhizhen Zhao
We discuss key aspects to realize this endeavor, namely (i) the design and exploitation of scalable and computationally efficient AI algorithms for Multi-Messenger Astrophysics; (ii) cyberinfrastructure requirements to numerically simulate astrophysical sources, and to process and interpret Multi-Messenger Astrophysics data; (iii) management of gravitational wave detections and triggers to enable electromagnetic and astro-particle follow-ups; (iv) a vision to harness future developments of machine and deep learning and cyberinfrastructure resources to cope with the scale of discovery in the Big Data Era; (v) and the need to build a community that brings domain experts together with data scientists on equal footing to maximize and accelerate discovery in the nascent field of Multi-Messenger Astrophysics.
no code implementations • 27 Nov 2018 • Ryan Chard, Zhuozhao Li, Kyle Chard, Logan Ward, Yadu Babuji, Anna Woodard, Steve Tuecke, Ben Blaiszik, Michael J. Franklin, Ian Foster
Here we present the Data and Learning Hub for science (DLHub), a multi-tenant system that provides both model repository and serving capabilities with a focus on science applications.