Search Results for author: Brandon Duderstadt

Found 9 papers, 4 papers with code

CoRNStack: High-Quality Contrastive Data for Better Code Ranking

1 code implementation1 Dec 2024 Tarun Suresh, Revanth Gangi Reddy, Yifei Xu, Zach Nussbaum, Andriy Mulyar, Brandon Duderstadt, Heng Ji

Effective code retrieval plays a crucial role in advancing code generation, bug fixing, and software maintenance, particularly as software systems increase in complexity.

Bug fixing Code Generation +2

Embedding-based statistical inference on generative models

no code implementations1 Oct 2024 Hayden Helm, Aranyak Acharyya, Brandon Duderstadt, Youngser Park, Carey E. Priebe

We demonstrate that using the perspective space as the basis of a notion of "similar" is effective for multiple model-level inference tasks.

In-Context Learning parameter-efficient fine-tuning

Tracking the perspectives of interacting language models

no code implementations17 Jun 2024 Hayden Helm, Brandon Duderstadt, Youngser Park, Carey E. Priebe

Large language models (LLMs) are capable of producing high quality information at unprecedented rates.

Retrieval

Nomic Embed Vision: Expanding the Latent Space

1 code implementation6 Jun 2024 Zach Nussbaum, Brandon Duderstadt, Andriy Mulyar

This technical report describes the training of nomic-embed-vision, a highly performant, open-code, open-weights image embedding model that shares the same latent space as nomic-embed-text.

Nomic Embed: Training a Reproducible Long Context Text Embedder

1 code implementation2 Feb 2024 Zach Nussbaum, John X. Morris, Brandon Duderstadt, Andriy Mulyar

This technical report describes the training of nomic-embed-text-v1, the first fully reproducible, open-source, open-weights, open-data, 8192 context length English text embedding model that outperforms both OpenAI Ada-002 and OpenAI text-embedding-3-small on short and long-context tasks.

GPT4All: An Ecosystem of Open Source Compressed Language Models

1 code implementation6 Nov 2023 Yuvanesh Anand, Zach Nussbaum, Adam Treat, Aaron Miller, Richard Guo, Ben Schmidt, GPT4All Community, Brandon Duderstadt, Andriy Mulyar

It is our hope that this paper acts as both a technical overview of the original GPT4All models as well as a case study on the subsequent growth of the GPT4All open source ecosystem.

Comparing Foundation Models using Data Kernels

no code implementations9 May 2023 Brandon Duderstadt, Hayden S. Helm, Carey E. Priebe

Further, we demonstrate how our methodology can be extended to facilitate population level model comparison.

Benchmarking Self-Supervised Learning +1

A partition-based similarity for classification distributions

no code implementations12 Nov 2020 Hayden S. Helm, Ronak D. Mehta, Brandon Duderstadt, Weiwei Yang, Christoper M. White, Ali Geisa, Joshua T. Vogelstein, Carey E. Priebe

Herein we define a measure of similarity between classification distributions that is both principled from the perspective of statistical pattern recognition and useful from the perspective of machine learning practitioners.

Classification General Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.