Search Results for author: Joshua T. Vogelstein

Found 55 papers, 23 papers with code

Streaming Decision Trees and Forests

1 code implementation16 Oct 2021 Haoyin Xu, Jayanta Dey, Sambit Panda, Joshua T. Vogelstein

However, there is still room for improvement on streaming algorithms based on batch decision trees and random forests, which are the leading methods in batch data tasks.

Transfer Learning

Towards a theory of out-of-distribution learning

no code implementations29 Sep 2021 Ali Geisa, Ronak Mehta, Hayden S. Helm, Jayanta Dey, Eric Eaton, Jeffery Dick, Carey E. Priebe, Joshua T. Vogelstein

This assumption renders these theories inadequate for characterizing 21$^{st}$ century real world data problems, which are typically characterized by evaluation distributions that differ from the training data distributions (referred to as out-of-distribution learning).

Learning Theory

Federated Causal Inference in Heterogeneous Observational Data

no code implementations25 Jul 2021 Ruoxuan Xiong, Allison Koenecke, Michael Powell, Zhu Shen, Joshua T. Vogelstein, Susan Athey

Analyzing observational data from multiple sources can be useful for increasing statistical power to detect a treatment effect; however, practical constraints such as privacy considerations may restrict individual-level information sharing across data sets.

Causal Inference

A partition-based similarity for classification distributions

no code implementations12 Nov 2020 Hayden S. Helm, Ronak D. Mehta, Brandon Duderstadt, Weiwei Yang, Christoper M. White, Ali Geisa, Joshua T. Vogelstein, Carey E. Priebe

Herein we define a measure of similarity between classification distributions that is both principled from the perspective of statistical pattern recognition and useful from the perspective of machine learning practitioners.

Classification General Classification +2

Multiple Network Embedding for Anomaly Detection in Time Series of Graphs

1 code implementation23 Aug 2020 Guodong Chen, Jesús Arroyo, Avanti Athreya, Joshua Cape, Joshua T. Vogelstein, Youngser Park, Chris White, Jonathan Larson, Weiwei Yang, Carey E. Priebe

We examine two related, complementary inference tasks: the detection of anomalous graphs within a time series, and the detection of temporally anomalous vertices.


Robust Similarity and Distance Learning via Decision Forests

no code implementations27 Jul 2020 Tyler M. Tomita, Joshua T. Vogelstein

Many algorithms have been proposed for automated learning of suitable distances, most of which employ linear methods to learn a global metric over the feature space.

mvlearn: Multiview Machine Learning in Python

no code implementations25 May 2020 Ronan Perry, Gavin Mischler, Richard Guo, Theodore Lee, Alexander Chang, Arman Koul, Cameron Franz, Hugo Richard, Iain Carmichael, Pierre Ablin, Alexandre Gramfort, Joshua T. Vogelstein

As data are generated more and more from multiple disparate sources, multiview data sets, where each sample has features in distinct views, have ballooned in recent years.

Learning to rank via combining representations

2 code implementations20 May 2020 Hayden S. Helm, Amitabh Basu, Avanti Athreya, Youngser Park, Joshua T. Vogelstein, Michael Winding, Marta Zlatic, Albert Cardona, Patrick Bourke, Jonathan Larson, Chris White, Carey E. Priebe

Learning to rank -- producing a ranked list of items specific to a query and with respect to a set of supervisory items -- is a problem of general interest.


Omnidirectional Transfer for Quasilinear Lifelong Learning

1 code implementation27 Apr 2020 Joshua T. Vogelstein, Jayanta Dey, Hayden S. Helm, Will LeVine, Ronak D. Mehta, Ali Geisa, Haoyin Xu, Gido M. van de Ven, Emily Chang, Chenyu Gao, Weiwei Yang, Bryan Tower, Jonathan Larson, Christopher M. White, Carey E. Priebe

But striving to avoid forgetting sets the goal unnecessarily low: the goal of lifelong learning, whether biological or artificial, should be to improve performance on all tasks (including past and future) with any new data.

Federated Learning Transfer Learning

A New Age of Computing and the Brain

no code implementations27 Apr 2020 Polina Golland, Jack Gallant, Greg Hager, Hanspeter Pfister, Christos Papadimitriou, Stefan Schaal, Joshua T. Vogelstein

In December 2014, a two-day workshop supported by the Computing Community Consortium (CCC) and the National Science Foundation's Computer and Information Science and Engineering Directorate (NSF CISE) was convened in Washington, DC, with the goal of bringing together computer scientists and brain researchers to explore these new opportunities and connections, and develop a new, modern dialogue between the two research communities.

The Chi-Square Test of Distance Correlation

1 code implementation27 Dec 2019 Cencheng Shen, Sambit Panda, Joshua T. Vogelstein

One major bottleneck is the testing process: because the null distribution of distance correlation depends on the underlying random variables and metric choice, it typically requires a permutation test to estimate the null and compute the p-value, which is very costly for large amount of data.

Nonpar MANOVA via Independence Testing

no code implementations20 Oct 2019 Sambit Panda, Cencheng Shen, Ronan Perry, Jelle Zorn, Antoine Lutz, Carey E. Priebe, Joshua T. Vogelstein

The $k$-sample testing problem tests whether or not $k$ groups of data points are sampled from the same distribution.

Two-sample testing

Manifold Oblique Random Forests: Towards Closing the Gap on Convolutional Deep Networks

1 code implementation25 Sep 2019 Ronan Perry, Adam Li, Chester Huynh, Tyler M. Tomita, Ronak Mehta, Jesus Arroyo, Jesse Patsolic, Benjamin Falk, Joshua T. Vogelstein

In particular, Forests dominate other methods in tabular data, that is, when the feature space is unstructured, so that the signal is invariant to a permutation of the feature indices.

EEG Image Classification +1

AutoGMM: Automatic and Hierarchical Gaussian Mixture Modeling in Python

1 code implementation6 Sep 2019 Thomas L. Athey, Tingshan Liu, Benjamin D. Pedigo, Joshua T. Vogelstein

Background: Gaussian mixture modeling is a fundamental tool in clustering, as well as discriminant analysis and semiparametric density estimation.

Density Estimation

Independence Testing for Multivariate Time Series

no code implementations18 Aug 2019 Ronak Mehta, Jaewon Chung, Cencheng Shen, Ting Xu, Joshua T. Vogelstein

The proposed nonparametric procedure is valid and consistent, building upon prior work by characterizing the geometry of the relationship, estimating the time lag at which dependence is maximized, avoiding the need for multiple testing, and exhibiting superior power in high-dimensional, low sample size, nonlinear settings.

Time Series Time Series Analysis

Graphyti: A Semi-External Memory Graph Library for FlashGraph

no code implementations7 Jul 2019 Disa Mhembere, Da Zheng, Carey E. Priebe, Joshua T. Vogelstein, Randal Burns

Emerging frameworks avoid the network bottleneck of distributed data with Semi-External Memory (SEM) that uses a single multicore node and operates on graphs larger than memory.

Distributed, Parallel, and Cluster Computing Databases

Geodesic Learning via Unsupervised Decision Forests

no code implementations5 Jul 2019 Meghana Madhyastha, Percy Li, James Browne, Veronika Strnadova-Neeley, Carey E. Priebe, Randal Burns, Joshua T. Vogelstein

Empirical results on simulated and real data demonstrate that URerF is robust to high-dimensional noise, where as other methods, such as Isomap, UMAP, and FLANN, quickly deteriorate in such settings.

hyppo: A Multivariate Hypothesis Testing Python Package

4 code implementations3 Jul 2019 Sambit Panda, Satish Palaniappan, Junhao Xiong, Eric W. Bridgeford, Ronak Mehta, Cencheng Shen, Joshua T. Vogelstein

We introduce hyppo, a unified library for performing multivariate hypothesis testing, including independence, two-sample, and k-sample testing.

Two-sample testing

Random Forests for Adaptive Nearest Neighbor Estimation of Information-Theoretic Quantities

1 code implementation30 Jun 2019 Ronan Perry, Ronak Mehta, Richard Guo, Eva Yezerets, Jesús Arroyo, Mike Powell, Hayden Helm, Cencheng Shen, Joshua T. Vogelstein

Information-theoretic quantities, such as conditional entropy and mutual information, are critical data summaries for quantifying uncertainty.

GraSPy: Graph Statistics in Python

1 code implementation29 Mar 2019 Jaewon Chung, Benjamin D. Pedigo, Eric W. Bridgeford, Bijan K. Varjavand, Hayden S. Helm, Joshua T. Vogelstein

We introduce GraSPy, a Python library devoted to statistical inference, machine learning, and visualization of random graphs and graph populations.

Learning Interpretable Characteristic Kernels via Decision Forests

no code implementations30 Nov 2018 Cencheng Shen, Sambit Panda, Joshua T. Vogelstein

It has been demonstrated that these proximity matrices can be thought of as kernels, connecting the decision forest literature to the extensive kernel machine literature.

Feature Importance General Classification

The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing

no code implementations14 Jun 2018 Cencheng Shen, Joshua T. Vogelstein

Distance-based tests, also called "energy statistics", are leading methods for two-sample and independence tests from the statistics community.

Two-sample testing

Kernel k-Groups via Hartigan's Method

no code implementations26 Oct 2017 Guilherme França, Maria L. Rizzo, Joshua T. Vogelstein

In this paper, we consider a formulation for the clustering problem using a weighted version of energy statistics in spaces of negative type.

Community Detection graph partitioning

From Distance Correlation to Multiscale Graph Correlation

1 code implementation26 Oct 2017 Cencheng Shen, Carey E. Priebe, Joshua T. Vogelstein

Understanding and developing a correlation measure that can detect general dependencies is not only imperative to statistics and machine learning, but also crucial to general scientific discovery in the big data age.

Statistical inference on random dot product graphs: a survey

no code implementations16 Sep 2017 Avanti Athreya, Donniell E. Fishkind, Keith Levin, Vince Lyzinski, Youngser Park, Yichen Qin, Daniel L. Sussman, Minh Tang, Joshua T. Vogelstein, Carey E. Priebe

In this survey paper, we describe a comprehensive paradigm for statistical inference on random dot product graphs, a paradigm centered on spectral embeddings of adjacency and Laplacian matrices.

Community Detection

Supervised Dimensionality Reduction for Big Data

1 code implementation5 Sep 2017 Joshua T. Vogelstein, Eric Bridgeford, Minh Tang, Da Zheng, Christopher Douville, Randal Burns, Mauro Maggioni

To solve key biomedical problems, experimentalists now routinely measure millions or billions of features (dimensions) per sample, with the hope that data science techniques will be able to build accurate data-driven inferences.

General Classification Supervised dimensionality reduction

Joint Embedding of Graphs

2 code implementations10 Mar 2017 Shangsi Wang, Jesús Arroyo, Joshua T. Vogelstein, Carey E. Priebe

Feature extraction and dimension reduction for networks is critical in a wide variety of domains.

Dimensionality Reduction

Probabilistic Fluorescence-Based Synapse Detection

no code implementations16 Nov 2016 Anish K. Simhal, Cecilia Aguerrebere, Forrest Collman, Joshua T. Vogelstein, Kristina D. Micheva, Richard J. Weinberg, Stephen J. Smith, Guillermo Sapiro

The present work describes new probabilistic image analysis methods for single-synapse analysis of synapse populations in both animal and human brains.

Discovering and Deciphering Relationships Across Disparate Data Modalities

4 code implementations16 Sep 2016 Joshua T. Vogelstein, Eric Bridgeford, Qing Wang, Carey E. Priebe, Mauro Maggioni, Cencheng Shen

Understanding the relationships between different properties of data, such as whether a connectome or genome has information about disease status, is becoming increasingly important in modern biological datasets.

Connectome Smoothing via Low-rank Approximations

no code implementations6 Sep 2016 Runze Tang, Michael Ketcha, Alexandra Badea, Evan D. Calabrese, Daniel S. Margulies, Joshua T. Vogelstein, Carey E. Priebe, Daniel L. Sussman

In statistical connectomics, the quantitative study of brain networks, estimating the mean of a population of graphs based on a sample is a core problem.

knor: A NUMA-Optimized In-Memory, Distributed and Semi-External-Memory k-means Library

1 code implementation28 Jun 2016 Disa Mhembere, Da Zheng, Carey E. Priebe, Joshua T. Vogelstein, Randal Burns

The \textit{k-means NUMA Optimized Routine} (\textsf{knor}) library has (i) in-memory (\textsf{knori}), (ii) distributed memory (\textsf{knord}), and (iii) semi-external memory (\textsf{knors}) modules that radically improve the performance of k-means for varying memory and hardware budgets.

Distributed, Parallel, and Cluster Computing

Deformably Registering and Annotating Whole CLARITY Brains to an Atlas via Masked LDDMM

1 code implementation6 May 2016 Kwame S. Kutten, Joshua T. Vogelstein, Nicolas Charon, Li Ye, Karl Deisseroth, Michael I. Miller

Therefore, we developed a method (Mask-LDDMM) for registering CLARITY images, that automatically find the brain boundary and learns the optimal deformation between the brain and atlas masks.

FlashR: R-Programmed Parallel and Scalable Machine Learning using SSDs

1 code implementation21 Apr 2016 Da Zheng, Disa Mhembere, Joshua T. Vogelstein, Carey E. Priebe, Randal Burns

R is one of the most popular programming languages for statistics and machine learning, but the R framework is relatively slow and unable to scale to large datasets.

Distributed, Parallel, and Cluster Computing

Quantifying mesoscale neuroanatomy using X-ray microtomography

no code implementations13 Apr 2016 Eva L. Dyer, William Gray Roncal, Hugo L. Fernandes, Doga Gürsoy, Vincent De Andrade, Rafael Vescovi, Kamel Fezzaa, Xianghui Xiao, Joshua T. Vogelstein, Chris Jacobsen, Konrad P. Körding, Narayanan Kasthuri

Methods for resolving the 3D microstructure of the brain typically start by thinly slicing and staining the brain, and then imaging each individual section with visible light photons or electrons.

Sparse Projection Oblique Randomer Forests

2 code implementations10 Jun 2015 Tyler M. Tomita, James Browne, Cencheng Shen, Jaewon Chung, Jesse L. Patsolic, Benjamin Falk, Jason Yim, Carey E. Priebe, Randal Burns, Mauro Maggioni, Joshua T. Vogelstein

Unfortunately, these extensions forfeit one or more of the favorable properties of decision forests based on axis-aligned splits, such as robustness to many noise dimensions, interpretability, or computational efficiency.

Manifold Matching using Shortest-Path Distance and Joint Neighborhood Selection

1 code implementation12 Dec 2014 Cencheng Shen, Joshua T. Vogelstein, Carey E. Priebe

Then the shortest-path distance within each modality is calculated from the joint neighborhood graph, followed by embedding into and matching in a common low-dimensional Euclidean space.

Covariate-assisted spectral clustering

no code implementations8 Nov 2014 Norbert Binkiewicz, Joshua T. Vogelstein, Karl Rohe

We utilize these node covariates to help uncover latent communities in a graph, using a modification of spectral clustering.

Graph Matching: Relax at Your Own Risk

no code implementations13 May 2014 Vince Lyzinski, Donniell Fishkind, Marcelo Fiori, Joshua T. Vogelstein, Carey E. Priebe, Guillermo Sapiro

Indeed, experimental results illuminate and corroborate these theoretical findings, demonstrating that excellent results are achieved in both benchmark and real data problems by amalgamating the two approaches.

Graph Matching

Automatic Annotation of Axoplasmic Reticula in Pursuit of Connectomes

no code implementations16 Apr 2014 Ayushi Sinha, William Gray Roncal, Narayanan Kasthuri, Ming Chuang, Priya Manavalan, Dean M. Kleissas, Joshua T. Vogelstein, R. Jacob Vogelstein, Randal Burns, Jeff W. Lichtman, Michael Kazhdan

The contribution of this work is the introduction of a straightforward and robust pipeline which annotates axoplasmic reticula with high precision, contributing towards advancements in automatic feature annotations in neural EM data.

VESICLE: Volumetric Evaluation of Synaptic Interfaces using Computer vision at Large Scale

no code implementations14 Mar 2014 William Gray Roncal, Michael Pekala, Verena Kaynig-Fittkau, Dean M. Kleissas, Joshua T. Vogelstein, Hanspeter Pfister, Randal Burns, R. Jacob Vogelstein, Mark A. Chevillet, Gregory D. Hager

An open challenge problem at the forefront of modern neuroscience is to obtain a comprehensive mapping of the neural pathways that underlie human brain function; an enhanced understanding of the wiring diagram of the brain promises to lead to new breakthroughs in diagnosing and treating neurological disorders.

Electron Microscopy Object Detection

Real-Time Inference for a Gamma Process Model of Neural Spiking

no code implementations NeurIPS 2013 David E. Carlson, Vinayak Rao, Joshua T. Vogelstein, Lawrence Carin

With simultaneous measurements from ever increasing populations of neurons, there is a growing need for sophisticated tools to recover signals from individual neurons.

Statistical inference on errorfully observed graphs

no code implementations15 Nov 2012 Carey E. Priebe, Daniel L. Sussman, Minh Tang, Joshua T. Vogelstein

Thus we errorfully observe $G$ when we observe the graph $\widetilde{G} = (V,\widetilde{E})$ as the edges in $\widetilde{E}$ arise from the classifications of the "edge-features", and are expected to be errorful.

Cannot find the paper you are looking for? You can Submit a new open access paper.