1 code implementation • 1 Dec 2024 • Tarun Suresh, Revanth Gangi Reddy, Yifei Xu, Zach Nussbaum, Andriy Mulyar, Brandon Duderstadt, Heng Ji
Effective code retrieval plays a crucial role in advancing code generation, bug fixing, and software maintenance, particularly as software systems increase in complexity.
no code implementations • 1 Oct 2024 • Hayden Helm, Aranyak Acharyya, Brandon Duderstadt, Youngser Park, Carey E. Priebe
We demonstrate that using the perspective space as the basis of a notion of "similar" is effective for multiple model-level inference tasks.
no code implementations • 17 Jun 2024 • Hayden Helm, Brandon Duderstadt, Youngser Park, Carey E. Priebe
Large language models (LLMs) are capable of producing high quality information at unprecedented rates.
1 code implementation • 6 Jun 2024 • Zach Nussbaum, Brandon Duderstadt, Andriy Mulyar
This technical report describes the training of nomic-embed-vision, a highly performant, open-code, open-weights image embedding model that shares the same latent space as nomic-embed-text.
1 code implementation • 2 Feb 2024 • Zach Nussbaum, John X. Morris, Brandon Duderstadt, Andriy Mulyar
This technical report describes the training of nomic-embed-text-v1, the first fully reproducible, open-source, open-weights, open-data, 8192 context length English text embedding model that outperforms both OpenAI Ada-002 and OpenAI text-embedding-3-small on short and long-context tasks.
1 code implementation • 6 Nov 2023 • Yuvanesh Anand, Zach Nussbaum, Adam Treat, Aaron Miller, Richard Guo, Ben Schmidt, GPT4All Community, Brandon Duderstadt, Andriy Mulyar
It is our hope that this paper acts as both a technical overview of the original GPT4All models as well as a case study on the subsequent growth of the GPT4All open source ecosystem.
no code implementations • 9 May 2023 • Brandon Duderstadt, Hayden S. Helm, Carey E. Priebe
Further, we demonstrate how our methodology can be extended to facilitate population level model comparison.
no code implementations • 12 Nov 2020 • Hayden S. Helm, Ronak D. Mehta, Brandon Duderstadt, Weiwei Yang, Christoper M. White, Ali Geisa, Joshua T. Vogelstein, Carey E. Priebe
Herein we define a measure of similarity between classification distributions that is both principled from the perspective of statistical pattern recognition and useful from the perspective of machine learning practitioners.
no code implementations • 9 Mar 2018 • Gregory Kiar, Robert J. Anderson, Alex Baden, Alexandra Badea, Eric W. Bridgeford, Andrew Champion, Vikram Chandrashekhar, Forrest Collman, Brandon Duderstadt, Alan C. Evans, Florian Engert, Benjamin Falk, Tristan Glatard, William R. Gray Roncal, David N. Kennedy, Jeremy Maitin-Shepard, Ryan A. Marren, Onyeka Nnaemeka, Eric Perlman, Sharmishtaas Seshamani, Eric T. Trautman, Daniel J. Tward, Pedro Antonio Valdés-Sosa, Qing Wang, Michael I. Miller, Randal Burns, Joshua T. Vogelstein
Neuroscientists are now able to acquire data at staggering rates across spatiotemporal scales.