no code implementations • 1 Apr 2024 • Camille Olivia Little, Genevera I. Allen
Ensemble methods, particularly boosting, have established themselves as highly effective and widely embraced machine learning techniques for tabular data.
no code implementations • 6 Oct 2023 • Camille Olivia Little, Debolina Halder Lina, Genevera I. Allen
Specifically, we develop a novel fair feature importance score for trees that can be used to interpret how each feature contributes to fairness or bias in trees, tree-based ensembles, or tree-based surrogates of any complex ML system.
no code implementations • 13 Sep 2023 • Madeline Navarro, Camille Little, Genevera I. Allen, Santiago Segarra
Furthermore, our method allows us to use the generalization ability of mixup to improve both fairness and accuracy.
no code implementations • 2 Aug 2023 • Genevera I. Allen, Luqin Gan, Lili Zheng
In this paper, we discuss and review the field of interpretable machine learning, focusing especially on the techniques as they are often employed to generate new knowledge or make discoveries from large data sets.
no code implementations • 22 May 2023 • Andersen Chang, Lili Zheng, Gautam Dasarthy, Genevera I. Allen
Probabilistic graphical models have become an important unsupervised learning tool for detecting network structures for a variety of problems, including the estimation of functional neuronal connectivity from two-photon calcium imaging data.
no code implementations • 17 Sep 2022 • Andersen Chang, Lili Zheng, Genevera I. Allen
This leads to the Graph Quilting problem, as first introduced by (Vinci et. al.
no code implementations • 5 Jun 2022 • Luqin Gan, Lili Zheng, Genevera I. Allen
Our approach is fast as we avoid model refitting by leveraging a form of random observation and feature subsampling called minipatch ensembles; this approach also improves statistical power by avoiding data splitting.
no code implementations • 1 Nov 2021 • Madeline Navarro, Genevera I. Allen, Michael Weylandt
In this paper, we propose a convex approach for the task of network clustering.
no code implementations • 22 Oct 2021 • Tianyi Yao, Minjie Wang, Genevera I. Allen
Gaussian graphical models provide a powerful framework for uncovering conditional dependence relationships between sets of nodes; they have found applications in a wide variety of fields including sensor and communication networks, physics, finance, and computational biology.
no code implementations • 5 Oct 2021 • Luqin Gan, Genevera I. Allen
Additionally, we develop adaptive sampling schemes for observations, which result in both improved reliability and computational savings, as well as adaptive sampling schemes of features, which leads to interpretable solutions by quickly learning the most relevant features that differentiate clusters.
no code implementations • 13 Apr 2021 • Minjie Wang, Genevera I. Allen
In neuroscience, researchers seek to uncover the connectivity of neurons from large-scale neural recordings or imaging; often people employ graphical model selection and estimation techniques for this purpose.
no code implementations • 8 Dec 2020 • Michael Weylandt, T. Mitchell Roddenberry, Genevera I. Allen
In contrast to common practice which denoises then clusters, our method is a unified, convex approach that performs both simultaneously.
1 code implementation • 15 Nov 2020 • Kelly Geyer, Frederick Campbell, Andersen Chang, John Magnotti, Michael Beauchamp, Genevera I. Allen
After signal processing, this type of data may be organized as a 4-way tensor with dimensions representing trials, electrodes, frequency, and time.
no code implementations • 14 Nov 2020 • Mohammad Taha Toghani, Genevera I. Allen
We achieve this by developing MP-Boost, an algorithm loosely based on AdaBoost that learns by adaptively selecting small subsets of instances and features, or what we term minipatches (MP), at each iteration.
no code implementations • 16 Oct 2020 • Tianyi Yao, Genevera I. Allen
While feature selection is a well-studied problem with many widely-used techniques, there are typically two key challenges: i) many existing approaches become computationally intractable in huge-data settings with millions of observations and features; and ii) the statistical accuracy of selected features degrades in high-noise, high-correlation settings, thus hindering reliable model interpretation.
1 code implementation • 25 May 2020 • Minjie Wang, Tianyi Yao, Genevera I. Allen
Clustering has long been a popular unsupervised learning approach to identify groups of similar objects and discover patterns from unlabeled data in many applications.
no code implementations • 11 Dec 2019 • Minjie Wang, Genevera I. Allen
While several techniques for such integrative clustering have been explored, we propose and develop a convex formalization that will inherit the strong statistical, mathematical and empirical properties of increasingly popular convex clustering methods.
no code implementations • 30 May 2019 • Tianyi Yao, Genevera I. Allen
Knowledge of functional groupings of neurons can shed light on structures of neural circuits and is valuable in many types of neuroimaging studies.
no code implementations • 27 Mar 2019 • Yulia Baker, Tiffany M. Tang, Genevera I. Allen
B-RAIL serves as a versatile data integration method for sparse regression and graph selection, and we demonstrate the effectiveness of B-RAIL through extensive simulations and a case study to infer the ovarian cancer gene regulatory network.
1 code implementation • 6 Jan 2019 • Michael Weylandt, John Nagorski, Genevera I. Allen
Convex clustering is a promising new approach to the classical problem of clustering, combining strong performance in empirical studies with rigorous theoretical foundations.
1 code implementation • 31 Aug 2016 • David I. Inouye, Eunho Yang, Genevera I. Allen, Pradeep Ravikumar
The Poisson distribution has been widely studied and used for modeling univariate count-valued data.
no code implementations • 2 Nov 2014 • Eunho Yang, Pradeep Ravikumar, Genevera I. Allen, Yulia Baker, Ying-Wooi Wan, Zhandong Liu
"Mixed Data" comprising a large number of heterogeneous variables (e. g. count, binary, continuous, skewed continuous, among other data types) are prevalent in varied areas such as genomics and proteomics, imaging genetics, national security, social networking, and Internet advertising.
no code implementations • 5 Aug 2014 • Eric C. Chi, Genevera I. Allen, Richard G. Baraniuk
In the biclustering problem, we seek to simultaneously group observations and features.
no code implementations • NeurIPS 2013 • Eunho Yang, Pradeep K. Ravikumar, Genevera I. Allen, Zhandong Liu
Undirected graphical models, such as Gaussian graphical models, Ising, and multinomial/categorical graphical models, are widely used in a variety of applications for modeling distributions over a large number of variables.
no code implementations • NeurIPS 2013 • Eunho Yang, Pradeep K. Ravikumar, Genevera I. Allen, Zhandong Liu
We thus introduce a “novel subclass of CRFs”, derived by imposing node-wise conditional distributions of response variables conditioned on the rest of the responses and the covariates as arising from univariate exponential families.
1 code implementation • 11 Sep 2013 • Genevera I. Allen, Michael Weylandt
We propose a unified approach to regularized PCA which can induce both sparsity and smoothness in both the row and column principal components.
no code implementations • 17 Jan 2013 • Eunho Yang, Pradeep Ravikumar, Genevera I. Allen, Zhandong Liu
Undirected graphical models, or Markov networks, are a popular class of statistical models, used in a wide variety of applications.