Search Results for author: Joshua T. Vogelstein

Found 65 papers, 29 papers with code

Preserving Derivative Information while Transforming Neuronal Curves

no code implementations16 Mar 2023 Thomas L. Athey, Daniel J. Tward, Ulrich Mueller, Laurent Younes, Joshua T. Vogelstein, Michael I. Miller

Then, the traces are mapped to common coordinate systems by transforming the positions of their points, which neglects how the transformation bends the line segments in between.

The Value of Out-of-Distribution Data

1 code implementation23 Aug 2022 Ashwin De Silva, Rahul Ramesh, Carey E. Priebe, Pratik Chaudhari, Joshua T. Vogelstein

In this work, we show a counter-intuitive phenomenon: the generalization error of a task can be a non-monotonic function of the number of OOD samples.

Data Augmentation Hyperparameter Optimization

Graph Matching via Optimal Transport

1 code implementation9 Nov 2021 Ali Saad-Eldin, Benjamin D. Pedigo, Carey E. Priebe, Joshua T. Vogelstein

The graph matching problem seeks to find an alignment between the nodes of two graphs that minimizes the number of adjacency disagreements.

Graph Matching

Simplest Streaming Trees

2 code implementations16 Oct 2021 Haoyin Xu, Jayanta Dey, Sambit Panda, Joshua T. Vogelstein

In a benchmark suite containing 72 classification problems (the OpenML-CC18 data suite), we illustrate that our approach, Stream Decision Forest (SDF), does not suffer from either of the aforementioned limitations.

Continual Learning Transfer Learning

Towards a theory of out-of-distribution learning

no code implementations29 Sep 2021 Ali Geisa, Ronak Mehta, Hayden S. Helm, Jayanta Dey, Eric Eaton, Jeffery Dick, Carey E. Priebe, Joshua T. Vogelstein

This assumption renders these theories inadequate for characterizing 21$^{st}$ century real world data problems, which are typically characterized by evaluation distributions that differ from the training data distributions (referred to as out-of-distribution learning).

Learning Theory

Federated Causal Inference in Heterogeneous Observational Data

1 code implementation25 Jul 2021 Ruoxuan Xiong, Allison Koenecke, Michael Powell, Zhu Shen, Joshua T. Vogelstein, Susan Athey

We are interested in estimating the effect of a treatment applied to individuals at multiple sites, where data is stored locally for each site.

Causal Inference

Hidden Markov Modeling for Maximum Likelihood Neuron Reconstruction

no code implementations4 Jun 2021 Thomas L. Athey, Daniel J. Tward, Ulrich Mueller, Joshua T. Vogelstein, Michael I. Miller

Our most probable estimation method models the task of reconstructing neuronal processes in the presence of other neurons, and thus is applicable in images with several neurons.

Semantic Segmentation

A partition-based similarity for classification distributions

no code implementations12 Nov 2020 Hayden S. Helm, Ronak D. Mehta, Brandon Duderstadt, Weiwei Yang, Christoper M. White, Ali Geisa, Joshua T. Vogelstein, Carey E. Priebe

Herein we define a measure of similarity between classification distributions that is both principled from the perspective of statistical pattern recognition and useful from the perspective of machine learning practitioners.

Classification General Classification +2

Multiple Network Embedding for Anomaly Detection in Time Series of Graphs

1 code implementation23 Aug 2020 Guodong Chen, Jesús Arroyo, Avanti Athreya, Joshua Cape, Joshua T. Vogelstein, Youngser Park, Chris White, Jonathan Larson, Weiwei Yang, Carey E. Priebe

We examine two related, complementary inference tasks: the detection of anomalous graphs within a time series, and the detection of temporally anomalous vertices.

Methodology

Robust Similarity and Distance Learning via Decision Forests

no code implementations27 Jul 2020 Tyler M. Tomita, Joshua T. Vogelstein

Many algorithms have been proposed for automated learning of suitable distances, most of which employ linear methods to learn a global metric over the feature space.

regression

mvlearn: Multiview Machine Learning in Python

no code implementations25 May 2020 Ronan Perry, Gavin Mischler, Richard Guo, Theodore Lee, Alexander Chang, Arman Koul, Cameron Franz, Hugo Richard, Iain Carmichael, Pierre Ablin, Alexandre Gramfort, Joshua T. Vogelstein

As data are generated more and more from multiple disparate sources, multiview data sets, where each sample has features in distinct views, have ballooned in recent years.

BIG-bench Machine Learning

A New Age of Computing and the Brain

no code implementations27 Apr 2020 Polina Golland, Jack Gallant, Greg Hager, Hanspeter Pfister, Christos Papadimitriou, Stefan Schaal, Joshua T. Vogelstein

In December 2014, a two-day workshop supported by the Computing Community Consortium (CCC) and the National Science Foundation's Computer and Information Science and Engineering Directorate (NSF CISE) was convened in Washington, DC, with the goal of bringing together computer scientists and brain researchers to explore these new opportunities and connections, and develop a new, modern dialogue between the two research communities.

Omnidirectional Transfer for Quasilinear Lifelong Learning

1 code implementation27 Apr 2020 Joshua T. Vogelstein, Jayanta Dey, Hayden S. Helm, Will LeVine, Ronak D. Mehta, Ali Geisa, Haoyin Xu, Gido M. van de Ven, Emily Chang, Chenyu Gao, Weiwei Yang, Bryan Tower, Jonathan Larson, Christopher M. White, Carey E. Priebe

But striving to avoid forgetting sets the goal unnecessarily low: the goal of lifelong learning, whether biological or artificial, should be to improve performance on all tasks (including past and future) with any new data.

Federated Learning Transfer Learning

The Chi-Square Test of Distance Correlation

1 code implementation27 Dec 2019 Cencheng Shen, Sambit Panda, Joshua T. Vogelstein

One major bottleneck is the testing process: because the null distribution of distance correlation depends on the underlying random variables and metric choice, it typically requires a permutation test to estimate the null and compute the p-value, which is very costly for large amount of data.

valid

High-dimensional and universally consistent k-sample tests

no code implementations20 Oct 2019 Sambit Panda, Cencheng Shen, Ronan Perry, Jelle Zorn, Antoine Lutz, Carey E. Priebe, Joshua T. Vogelstein

The evaluation included several popular independence statistics and covered a comprehensive set of simulations.

Two-sample testing

Manifold Oblique Random Forests: Towards Closing the Gap on Convolutional Deep Networks

1 code implementation25 Sep 2019 Adam Li, Ronan Perry, Chester Huynh, Tyler M. Tomita, Ronak Mehta, Jesus Arroyo, Jesse Patsolic, Benjamin Falk, Joshua T. Vogelstein

In particular, Forests dominate other methods in tabular data, that is, when the feature space is unstructured, so that the signal is invariant to a permutation of the feature indices.

EEG Image Classification +1

AutoGMM: Automatic and Hierarchical Gaussian Mixture Modeling in Python

1 code implementation6 Sep 2019 Thomas L. Athey, Tingshan Liu, Benjamin D. Pedigo, Joshua T. Vogelstein

Background: Gaussian mixture modeling is a fundamental tool in clustering, as well as discriminant analysis and semiparametric density estimation.

Clustering Density Estimation

Independence Testing for Temporal Data

no code implementations18 Aug 2019 Cencheng Shen, Jaewon Chung, Ronak Mehta, Ting Xu, Joshua T. Vogelstein

While many non-parametric and universally consistent dependence measures have recently been proposed, directly applying them to temporal data can inflate the p-value and result in invalid test.

Time Series Time Series Analysis +1

Graphyti: A Semi-External Memory Graph Library for FlashGraph

no code implementations7 Jul 2019 Disa Mhembere, Da Zheng, Carey E. Priebe, Joshua T. Vogelstein, Randal Burns

Emerging frameworks avoid the network bottleneck of distributed data with Semi-External Memory (SEM) that uses a single multicore node and operates on graphs larger than memory.

Distributed, Parallel, and Cluster Computing Databases

Geodesic Learning via Unsupervised Decision Forests

no code implementations5 Jul 2019 Meghana Madhyastha, Percy Li, James Browne, Veronika Strnadova-Neeley, Carey E. Priebe, Randal Burns, Joshua T. Vogelstein

Empirical results on simulated and real data demonstrate that URerF is robust to high-dimensional noise, where as other methods, such as Isomap, UMAP, and FLANN, quickly deteriorate in such settings.

hyppo: A Multivariate Hypothesis Testing Python Package

4 code implementations3 Jul 2019 Sambit Panda, Satish Palaniappan, Junhao Xiong, Eric W. Bridgeford, Ronak Mehta, Cencheng Shen, Joshua T. Vogelstein

We introduce hyppo, a unified library for performing multivariate hypothesis testing, including independence, two-sample, and k-sample testing.

Two-sample testing

Random Forests for Adaptive Nearest Neighbor Estimation of Information-Theoretic Quantities

1 code implementation30 Jun 2019 Ronan Perry, Ronak Mehta, Richard Guo, Eva Yezerets, Jesús Arroyo, Mike Powell, Hayden Helm, Cencheng Shen, Joshua T. Vogelstein

Information-theoretic quantities, such as conditional entropy and mutual information, are critical data summaries for quantifying uncertainty.

GraSPy: Graph Statistics in Python

2 code implementations29 Mar 2019 Jaewon Chung, Benjamin D. Pedigo, Eric W. Bridgeford, Bijan K. Varjavand, Hayden S. Helm, Joshua T. Vogelstein

We introduce GraSPy, a Python library devoted to statistical inference, machine learning, and visualization of random graphs and graph populations.

BIG-bench Machine Learning

The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing

no code implementations14 Jun 2018 Cencheng Shen, Joshua T. Vogelstein

Distance-based tests, also called "energy statistics", are leading methods for two-sample and independence tests from the statistics community.

Two-sample testing

Kernel k-Groups via Hartigan's Method

no code implementations26 Oct 2017 Guilherme França, Maria L. Rizzo, Joshua T. Vogelstein

In this paper, we consider a formulation for the clustering problem using a weighted version of energy statistics in spaces of negative type.

Clustering Community Detection +1

From Distance Correlation to Multiscale Graph Correlation

1 code implementation26 Oct 2017 Cencheng Shen, Carey E. Priebe, Joshua T. Vogelstein

Understanding and developing a correlation measure that can detect general dependencies is not only imperative to statistics and machine learning, but also crucial to general scientific discovery in the big data age.

Statistical inference on random dot product graphs: a survey

no code implementations16 Sep 2017 Avanti Athreya, Donniell E. Fishkind, Keith Levin, Vince Lyzinski, Youngser Park, Yichen Qin, Daniel L. Sussman, Minh Tang, Joshua T. Vogelstein, Carey E. Priebe

In this survey paper, we describe a comprehensive paradigm for statistical inference on random dot product graphs, a paradigm centered on spectral embeddings of adjacency and Laplacian matrices.

Community Detection

Supervised Dimensionality Reduction for Big Data

1 code implementation5 Sep 2017 Joshua T. Vogelstein, Eric Bridgeford, Minh Tang, Da Zheng, Christopher Douville, Randal Burns, Mauro Maggioni

To solve key biomedical problems, experimentalists now routinely measure millions or billions of features (dimensions) per sample, with the hope that data science techniques will be able to build accurate data-driven inferences.

Computational Efficiency General Classification +2

Joint Embedding of Graphs

2 code implementations10 Mar 2017 Shangsi Wang, Jesús Arroyo, Joshua T. Vogelstein, Carey E. Priebe

Feature extraction and dimension reduction for networks is critical in a wide variety of domains.

Dimensionality Reduction

Probabilistic Fluorescence-Based Synapse Detection

no code implementations16 Nov 2016 Anish K. Simhal, Cecilia Aguerrebere, Forrest Collman, Joshua T. Vogelstein, Kristina D. Micheva, Richard J. Weinberg, Stephen J. Smith, Guillermo Sapiro

The present work describes new probabilistic image analysis methods for single-synapse analysis of synapse populations in both animal and human brains.

Discovering and Deciphering Relationships Across Disparate Data Modalities

4 code implementations16 Sep 2016 Joshua T. Vogelstein, Eric Bridgeford, Qing Wang, Carey E. Priebe, Mauro Maggioni, Cencheng Shen

Understanding the relationships between different properties of data, such as whether a connectome or genome has information about disease status, is becoming increasingly important in modern biological datasets.

Computational Efficiency

Connectome Smoothing via Low-rank Approximations

no code implementations6 Sep 2016 Runze Tang, Michael Ketcha, Alexandra Badea, Evan D. Calabrese, Daniel S. Margulies, Joshua T. Vogelstein, Carey E. Priebe, Daniel L. Sussman

In statistical connectomics, the quantitative study of brain networks, estimating the mean of a population of graphs based on a sample is a core problem.

knor: A NUMA-Optimized In-Memory, Distributed and Semi-External-Memory k-means Library

1 code implementation28 Jun 2016 Disa Mhembere, Da Zheng, Carey E. Priebe, Joshua T. Vogelstein, Randal Burns

The \textit{k-means NUMA Optimized Routine} (\textsf{knor}) library has (i) in-memory (\textsf{knori}), (ii) distributed memory (\textsf{knord}), and (iii) semi-external memory (\textsf{knors}) modules that radically improve the performance of k-means for varying memory and hardware budgets.

Distributed, Parallel, and Cluster Computing

Deformably Registering and Annotating Whole CLARITY Brains to an Atlas via Masked LDDMM

1 code implementation6 May 2016 Kwame S. Kutten, Joshua T. Vogelstein, Nicolas Charon, Li Ye, Karl Deisseroth, Michael I. Miller

Therefore, we developed a method (Mask-LDDMM) for registering CLARITY images, that automatically find the brain boundary and learns the optimal deformation between the brain and atlas masks.

FlashR: R-Programmed Parallel and Scalable Machine Learning using SSDs

2 code implementations21 Apr 2016 Da Zheng, Disa Mhembere, Joshua T. Vogelstein, Carey E. Priebe, Randal Burns

R is one of the most popular programming languages for statistics and machine learning, but the R framework is relatively slow and unable to scale to large datasets.

Distributed, Parallel, and Cluster Computing

Quantifying mesoscale neuroanatomy using X-ray microtomography

no code implementations13 Apr 2016 Eva L. Dyer, William Gray Roncal, Hugo L. Fernandes, Doga Gürsoy, Vincent De Andrade, Rafael Vescovi, Kamel Fezzaa, Xianghui Xiao, Joshua T. Vogelstein, Chris Jacobsen, Konrad P. Körding, Narayanan Kasthuri

Methods for resolving the 3D microstructure of the brain typically start by thinly slicing and staining the brain, and then imaging each individual section with visible light photons or electrons.

Sparse Projection Oblique Randomer Forests

2 code implementations10 Jun 2015 Tyler M. Tomita, James Browne, Cencheng Shen, Jaewon Chung, Jesse L. Patsolic, Benjamin Falk, Jason Yim, Carey E. Priebe, Randal Burns, Mauro Maggioni, Joshua T. Vogelstein

Unfortunately, these extensions forfeit one or more of the favorable properties of decision forests based on axis-aligned splits, such as robustness to many noise dimensions, interpretability, or computational efficiency.

Computational Efficiency

Manifold Matching using Shortest-Path Distance and Joint Neighborhood Selection

1 code implementation12 Dec 2014 Cencheng Shen, Joshua T. Vogelstein, Carey E. Priebe

Then the shortest-path distance within each modality is calculated from the joint neighborhood graph, followed by embedding into and matching in a common low-dimensional Euclidean space.

Covariate-assisted spectral clustering

no code implementations8 Nov 2014 Norbert Binkiewicz, Joshua T. Vogelstein, Karl Rohe

We utilize these node covariates to help uncover latent communities in a graph, using a modification of spectral clustering.

Clustering

Graph Matching: Relax at Your Own Risk

no code implementations13 May 2014 Vince Lyzinski, Donniell Fishkind, Marcelo Fiori, Joshua T. Vogelstein, Carey E. Priebe, Guillermo Sapiro

Indeed, experimental results illuminate and corroborate these theoretical findings, demonstrating that excellent results are achieved in both benchmark and real data problems by amalgamating the two approaches.

Graph Matching

Automatic Annotation of Axoplasmic Reticula in Pursuit of Connectomes

no code implementations16 Apr 2014 Ayushi Sinha, William Gray Roncal, Narayanan Kasthuri, Ming Chuang, Priya Manavalan, Dean M. Kleissas, Joshua T. Vogelstein, R. Jacob Vogelstein, Randal Burns, Jeff W. Lichtman, Michael Kazhdan

The contribution of this work is the introduction of a straightforward and robust pipeline which annotates axoplasmic reticula with high precision, contributing towards advancements in automatic feature annotations in neural EM data.

VESICLE: Volumetric Evaluation of Synaptic Interfaces using Computer vision at Large Scale

no code implementations14 Mar 2014 William Gray Roncal, Michael Pekala, Verena Kaynig-Fittkau, Dean M. Kleissas, Joshua T. Vogelstein, Hanspeter Pfister, Randal Burns, R. Jacob Vogelstein, Mark A. Chevillet, Gregory D. Hager

An open challenge problem at the forefront of modern neuroscience is to obtain a comprehensive mapping of the neural pathways that underlie human brain function; an enhanced understanding of the wiring diagram of the brain promises to lead to new breakthroughs in diagnosing and treating neurological disorders.

object-detection Object Detection

Real-Time Inference for a Gamma Process Model of Neural Spiking

no code implementations NeurIPS 2013 David E. Carlson, Vinayak Rao, Joshua T. Vogelstein, Lawrence Carin

With simultaneous measurements from ever increasing populations of neurons, there is a growing need for sophisticated tools to recover signals from individual neurons.

Statistical inference on errorfully observed graphs

no code implementations15 Nov 2012 Carey E. Priebe, Daniel L. Sussman, Minh Tang, Joshua T. Vogelstein

Thus we errorfully observe $G$ when we observe the graph $\widetilde{G} = (V,\widetilde{E})$ as the edges in $\widetilde{E}$ arise from the classifications of the "edge-features", and are expected to be errorful.

Cannot find the paper you are looking for? You can Submit a new open access paper.