Search Results for author: Alex Gittens

Found 25 papers, 7 papers with code

Aligners: Decoupling LLMs and Alignment

no code implementations7 Mar 2024 Lilian Ngweta, Mayank Agarwal, Subha Maity, Alex Gittens, Yuekai Sun, Mikhail Yurochkin

Large Language Models (LLMs) need to be aligned with human expectations to ensure their safety and utility in most applications.

Improving Neural Ranking Models with Traditional IR Methods

1 code implementation29 Aug 2023 Anik Saha, Oktie Hassanzadeh, Alex Gittens, Jian Ni, Kavitha Srinivas, Bulent Yener

Neural ranking methods based on large transformer models have recently gained significant attention in the information retrieval community, and have been adopted by major commercial solutions.

Information Retrieval Retrieval

A Cross-Domain Evaluation of Approaches for Causal Knowledge Extraction

1 code implementation7 Aug 2023 Anik Saha, Oktie Hassanzadeh, Alex Gittens, Jian Ni, Kavitha Srinivas, Bulent Yener

Causal knowledge extraction is the task of extracting relevant causes and effects from text by detecting the causal relation.

Binary Classification

Deception by Omission: Using Adversarial Missingness to Poison Causal Structure Learning

no code implementations31 May 2023 Deniz Koyuncu, Alex Gittens, Bülent Yener, Moti Yung

Inference of causal structures from observational data is a key component of causal machine learning; in practice, this data may be incompletely observed.

Reduced Label Complexity For Tight $\ell_2$ Regression

no code implementations12 May 2023 Alex Gittens, Malik Magdon-Ismail

Open question: Can label complexity be reduced by $\Omega(n)$ with tight $(1+d/n)$-approximation?

Open-Ended Question Answering regression

Word Sense Induction with Knowledge Distillation from BERT

no code implementations20 Apr 2023 Anik Saha, Alex Gittens, Bulent Yener

This paper proposes a two-stage method to distill multiple word senses from a pre-trained language model (BERT) by using attention over the senses of a word in a context and transferring this sense information to fit multi-sense embeddings in a skip-gram-like framework.

Knowledge Distillation Language Modelling +3

Simple Disentanglement of Style and Content in Visual Representations

1 code implementation20 Feb 2023 Lilian Ngweta, Subha Maity, Alex Gittens, Yuekai Sun, Mikhail Yurochkin

Learning visual representations with interpretable features, i. e., disentangled representations, remains a challenging problem.

Disentanglement Domain Generalization

Output Randomization: A Novel Defense for both White-box and Black-box Adversarial Models

no code implementations8 Jul 2021 Daniel Park, Haidar Khan, Azer Khan, Alex Gittens, Bülent Yener

Adversarial examples pose a threat to deep neural network models in a variety of scenarios, from settings where the adversary has complete knowledge of the model in a "white box" setting and to the opposite in a "black box" setting.

Reading StackOverflow Encourages Cheating: Adding Question Text Improves Extractive Code Generation

1 code implementation ACL (NLP4Prog) 2021 Gabriel Orlanski, Alex Gittens

We evaluate prior state-of-the-art CoNaLa models with this additional data and find that our proposed method of using the body and mined data beats the BLEU score of the prior state-of-the-art by $71. 96\%$.

Code Generation

Learning Fair Canonical Polyadical Decompositions using a Kernel Independence Criterion

no code implementations27 Apr 2021 Kevin Kim, Alex Gittens

This work proposes to learn fair low-rank tensor decompositions by regularizing the Canonical Polyadic Decomposition factorization with the kernel Hilbert-Schmidt independence criterion (KHSIC).

Fairness

NoisyCUR: An algorithm for two-cost budgeted matrix completion

1 code implementation16 Apr 2021 Dong Hu, Alex Gittens, Malik Magdon-Ismail

Specifically, we consider that it is possible to obtain low noise, high cost observations of individual entries or high noise, low cost observations of entire columns.

Matrix Completion Vocal Bursts Valence Prediction

TINKER: A framework for Open source Cyberthreat Intelligence

no code implementations10 Feb 2021 Nidhi Rastogi, Sharmishtha Dutta, Mohammed J. Zaki, Alex Gittens, Charu Aggarwal

The information is extracted and stored in a structured format using knowledge graphs such that the semantics of the threat intelligence can be preserved and shared at scale with other security analysts.

Information Retrieval Intrusion Detection +3

MALOnt: An Ontology for Malware Threat Intelligence

1 code implementation20 Jun 2020 Nidhi Rastogi, Sharmishtha Dutta, Mohammed J. Zaki, Alex Gittens, Charu Aggarwal

The knowledge graph that uses MALOnt is instantiated from a corpus comprising hundreds of annotated malware threat reports.

Decision Making Graph Generation +1

Fast Fixed Dimension L2-Subspace Embeddings of Arbitrary Accuracy, With Application to L1 and L2 Tasks

no code implementations27 Sep 2019 Malik Magdon-Ismail, Alex Gittens

We give a fast oblivious L2-embedding of $A\in \mathbb{R}^{n x d}$ to $B\in \mathbb{R}^{r x d}$ satisfying $(1-\varepsilon)\|A x\|_2^2 \le \|B x\|_2^2 <= (1+\varepsilon) \|Ax\|_2^2.$ Our embedding dimension $r$ equals $d$, a constant independent of the distortion $\varepsilon$.

Skip-Gram − Zipf + Uniform = Vector Additivity

no code implementations ACL 2017 Alex Gittens, Dimitris Achlioptas, Michael W. Mahoney

An unexpected {``}side-effect{''} of such models is that their vectors often exhibit compositionality, i. e., \textit{adding}two word-vectors results in a vector that is only a small angle away from the vector of a word representing the semantic composite of the original words, e. g., {``}man{''} + {``}royal{''} = {``}king{''}.

Dimensionality Reduction Word Embeddings

Scalable Kernel K-Means Clustering with Nystrom Approximation: Relative-Error Bounds

no code implementations9 Jun 2017 Shusen Wang, Alex Gittens, Michael W. Mahoney

This work analyzes the application of this paradigm to kernel $k$-means clustering, and shows that applying the linear $k$-means clustering algorithm to $\frac{k}{\epsilon} (1 + o(1))$ features constructed using a so-called rank-restricted Nystr\"om approximation results in cluster assignments that satisfy a $1 + \epsilon$ approximation ratio in terms of the kernel $k$-means cost function, relative to the guarantee provided by the same algorithm without the use of the Nystr\"om method.

Clustering

Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

1 code implementation5 Jul 2016 Alex Gittens, Aditya Devarakonda, Evan Racah, Michael Ringenburg, Lisa Gerhardt, Jey Kottalam, Jialin Liu, Kristyn Maschhoff, Shane Canon, Jatin Chhugani, Pramod Sharma, Jiyan Yang, James Demmel, Jim Harrell, Venkat Krishnamurthy, Michael W. Mahoney, Prabhat

We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms.

Distributed, Parallel, and Cluster Computing G.1.3; C.2.4

Hardware Compliant Approximate Image Codes

no code implementations CVPR 2015 Da Kuang, Alex Gittens, Raffay Hamid

In recent years, several feature encoding schemes for the bags-of-visual-words model have been proposed.

Computational Efficiency General Classification

Tensor machines for learning target-specific polynomial features

no code implementations7 Apr 2015 Jiyan Yang, Alex Gittens

Recent years have demonstrated that using random feature maps can significantly decrease the training and testing times of kernel-based algorithms without significantly lowering their accuracy.

piCholesky: Polynomial Interpolation of Multiple Cholesky Factors for Efficient Approximate Cross-Validation

no code implementations2 Apr 2014 Da Kuang, Alex Gittens, Raffay Hamid

The dominant cost in solving least-square problems using Newton's method is often that of factorizing the Hessian matrix over multiple values of the regularization parameter ($\lambda$).

Compact Random Feature Maps

no code implementations17 Dec 2013 Raffay Hamid, Ying Xiao, Alex Gittens, Dennis Decoste

Kernel approximation using randomized feature maps has recently gained a lot of interest.

Spectral Clustering via the Power Method -- Provably

no code implementations12 Nov 2013 Christos Boutsidis, Alex Gittens, Prabhanjan Kambadur

Spectral clustering is one of the most important algorithms in data mining and machine intelligence; however, its computational complexity limits its application to truly large scale data analysis.

Clustering

Revisiting the Nystrom Method for Improved Large-Scale Machine Learning

no code implementations7 Mar 2013 Alex Gittens, Michael W. Mahoney

Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.