Search Results for author: Steven Skiena

Found 40 papers, 16 papers with code

DeepWalk: Online Learning of Social Representations

14 code implementations26 Mar 2014 Bryan Perozzi, Rami Al-Rfou, Steven Skiena

We present DeepWalk, a novel approach for learning latent representations of vertices in a network.

Anomaly Detection Language Modelling +1

Don't Walk, Skip! Online Learning of Multi-scale Network Embeddings

2 code implementations6 May 2016 Bryan Perozzi, Vivek Kulkarni, Haochen Chen, Steven Skiena

We present Walklets, a novel approach for learning multiscale representations of vertices in a network.

Social and Information Networks Physics and Society

Fast and Accurate Network Embeddings via Very Sparse Random Projection

2 code implementations30 Aug 2019 Haochen Chen, Syed Fahad Sultan, Yingtao Tian, Muhao Chen, Steven Skiena

Two key features of FastRP are: 1) it explicitly constructs a node similarity matrix that captures transitive relationships in a graph and normalizes matrix entries based on node degrees; 2) it utilizes very sparse random projection, which is a scalable optimization-free method for dimension reduction.

Dimensionality Reduction Network Embedding

HARP: Hierarchical Representation Learning for Networks

3 code implementations23 Jun 2017 Haochen Chen, Bryan Perozzi, Yifan Hu, Steven Skiena

We present HARP, a novel method for learning low dimensional embeddings of a graph's nodes which preserves higher-order structural features.

Social and Information Networks

Chapter Captor: Text Segmentation in Novels

1 code implementation EMNLP 2020 Charuta Pethe, Allen Kim, Steven Skiena

Books are typically segmented into chapters and sections, representing coherent subnarratives and topics.

Segmentation Text Segmentation

Enhanced Network Embeddings via Exploiting Edge Labels

1 code implementation13 Sep 2018 Haochen Chen, Xiaofei Sun, Yingtao Tian, Bryan Perozzi, Muhao Chen, Steven Skiena

Network embedding methods aim at learning low-dimensional latent representation of nodes in a network.

Social and Information Networks Physics and Society

Freshman or Fresher? Quantifying the Geographic Variation of Internet Language

2 code implementations22 Oct 2015 Vivek Kulkarni, Bryan Perozzi, Steven Skiena

Our analysis of British and American English over a period of 100 years reveals that semantic variation between these dialects is shrinking.

Analyzing Film Adaptation through Narrative Alignment

1 code implementation7 Nov 2023 Tanzir Pial, Shahreen Salim, Charuta Pethe, Allen Kim, Steven Skiena

Novels are often adapted into feature films, but the differences between the two media usually require dropping sections of the source text from the movie script.

text similarity

Nationality Classification Using Name Embeddings

1 code implementation25 Aug 2017 Junting Ye, Shuchu Han, Yifan Hu, Baris Coskun, Meizhu Liu, Hong Qin, Steven Skiena

Through our analysis of 57M contact lists from a major Internet company, we are able to design a fine-grained nationality classifier covering 39 groups representing over 90% of the world population.

Classification General Classification

A Tutorial on Network Embeddings

2 code implementations8 Aug 2018 Haochen Chen, Bryan Perozzi, Rami Al-Rfou, Steven Skiena

We further demonstrate the applications of network embeddings, and conclude the survey with future work in this area.

Social and Information Networks

What time is it? Temporal Analysis of Novels

1 code implementation EMNLP 2020 Allen Kim, Charuta Pethe, Steven Skiena

Recognizing the flow of time in a story is a crucial aspect of understanding it.

Descriptive

Online AUC Optimization for Sparse High-Dimensional Datasets

1 code implementation23 Sep 2020 Baojian Zhou, Yiming Ying, Steven Skiena

The Area Under the ROC Curve (AUC) is a widely used performance measure for imbalanced classification arising from many application domains where high-dimensional sparse data is abundant.

imbalanced classification Vocal Bursts Intensity Prediction

Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity Alignment

no code implementations18 Jun 2018 Muhao Chen, Yingtao Tian, Kai-Wei Chang, Steven Skiena, Carlo Zaniolo

Since many multilingual KGs also provide literal descriptions of entities, in this paper, we introduce an embedding-based approach which leverages a weakly aligned multilingual KG for semi-supervised cross-lingual learning using entity descriptions.

Entity Alignment Knowledge Graphs

Latent Human Traits in the Language of Social Media: An Open-Vocabulary Approach

no code implementations22 May 2017 Vivek Kulkarni, Margaret L. Kern, David Stillwell, Michal Kosinski, Sandra Matz, Lyle Ungar, Steven Skiena, H. Andrew Schwartz

Taking advantage of linguistic information available through Facebook, we study the process of inferring a new set of potential human traits based on unprompted language use.

Recognizing Descriptive Wikipedia Categories for Historical Figures

no code implementations24 Apr 2017 Yanqing Chen, Steven Skiena

Wikipedia is a useful knowledge source that benefits many applications in language processing and knowledge representation.

Descriptive Information Retrieval +2

False-Friend Detection and Entity Matching via Unsupervised Transliteration

no code implementations21 Nov 2016 Yanqing Chen, Steven Skiena

Transliterations play an important role in multilingual entity reference resolution, because proper names increasingly travel between languages in news and social media.

Translation Transliteration

On the Convergent Properties of Word Embedding Methods

no code implementations12 May 2016 Yingtao Tian, Vivek Kulkarni, Bryan Perozzi, Steven Skiena

Do word embeddings converge to learn similar things over different initializations?

Word Embeddings

Statistically Significant Detection of Linguistic Change

no code implementations12 Nov 2014 Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, Steven Skiena

We propose a new computational approach for tracking and detecting statistically significant linguistic shifts in the meaning and usage of words.

Change Point Detection Time Series +1

POLYGLOT-NER: Massive Multilingual Named Entity Recognition

no code implementations14 Oct 2014 Rami Al-Rfou, Vivek Kulkarni, Bryan Perozzi, Steven Skiena

We describe a system that builds Named Entity Recognition (NER) annotators for 40 major languages using Wikipedia and Freebase.

Information Retrieval Machine Translation +7

Inducing Language Networks from Continuous Space Word Representations

no code implementations6 Mar 2014 Bryan Perozzi, Rami Al-Rfou, Vivek Kulkarni, Steven Skiena

Recent advancements in unsupervised feature learning have developed powerful latent representations of words.

Polyglot: Distributed Word Representations for Multilingual NLP

no code implementations WS 2013 Rami Al-Rfou, Bryan Perozzi, Steven Skiena

We quantitatively demonstrate the utility of our word embeddings by using them as the sole features for training a part of speech tagger for a subset of these languages.

Language Modelling Multilingual NLP +1

Exploring the power of GPU's for training Polyglot language models

no code implementations5 Apr 2014 Vivek Kulkarni, Rami Al-Rfou', Bryan Perozzi, Steven Skiena

We evaluate the performance of training the model on the GPU and present optimizations that boost the performance on the GPU. One of the key optimizations, we propose increases the performance of a function involved in calculating and updating the gradient by approximately 50 times on the GPU for sufficiently large batch sizes.

The Expressive Power of Word Embeddings

no code implementations15 Jan 2013 Yanqing Chen, Bryan Perozzi, Rami Al-Rfou, Steven Skiena

We seek to better understand the difference in quality of the several publicly released embeddings.

Benchmarking Sentence +1

MediaRank: Computational Ranking of Online News Sources

no code implementations18 Mar 2019 Junting Ye, Steven Skiena

We base our algorithmic analysis on four properties journalists have established to be associated with reporting quality: peer reputation, reporting bias / breadth, bottomline financial pressure, and popularity.

The Secret Lives of Names? Name Embeddings from Social Media

no code implementations12 May 2019 Junting Ye, Steven Skiena

We find it interesting that adding name embeddings can further improve the performances of models using demographic features, which are traditionally used for lifespan modeling.

The Trumpiest Trump? Identifying a Subject's Most Characteristic Tweets

no code implementations IJCNLP 2019 Charuta Pethe, Steven Skiena

Indeed, we demonstrate a statistically significant correlation between this score and tweet popularity (likes/replies/retweets) for 13 of the 15 celebrities in our study.

Why do embedding spaces look as they do?

no code implementations29 Sep 2021 Xingzhi Guo, Baojian Zhou, Haochen Chen, Sergiy Verstyuk, Steven Skiena

The power of embedding representations is a curious phenomenon.

Hierarchies over Vector Space: Orienting Word and Graph Embeddings

no code implementations2 Nov 2022 Xingzhi Guo, Steven Skiena

Word and graph embeddings are widely used in deep learning applications.

Provable Fairness for Neural Network Models using Formal Verification

no code implementations16 Dec 2022 Giorgian Borca-Tasciuc, Xingzhi Guo, Stanley Bak, Steven Skiena

Machine learning models are increasingly deployed for critical decision-making tasks, making it important to verify that they do not contain gender or racial biases picked up from training data.

Decision Making Fairness

Does it pay to optimize AUC?

1 code implementation2 Jun 2023 Baojian Zhou, Steven Skiena

To better understand the value of optimizing for AUC, we present an efficient algorithm, namely AUC-opt, to find the provably optimal AUC linear classifier in $\mathbb{R}^2$, which runs in $\mathcal{O}(n_+ n_- \log (n_+ n_-))$ where $n_+$ and $n_-$ are the number of positive and negative samples respectively.

Prosody Analysis of Audiobooks

no code implementations10 Oct 2023 Charuta Pethe, Yunting Yin, Steven Skiena

Recent advances in text-to-speech have made it possible to generate natural-sounding audio from text.

Attribute Language Modelling +1

GNAT: A General Narrative Alignment Tool

no code implementations7 Nov 2023 Tanzir Pial, Steven Skiena

Algorithmic sequence alignment identifies similar segments shared between pairs of documents, and is fundamental to many NLP tasks.

text similarity

STONYBOOK: A System and Resource for Large-Scale Analysis of Novels

no code implementations6 Nov 2023 Charuta Pethe, Allen Kim, Rajesh Prabhakar, Tanzir Pial, Steven Skiena

Books have historically been the primary mechanism through which narratives are transmitted.

Word Definitions from Large Language Models

no code implementations10 Nov 2023 Yunting Yin, Steven Skiena

Dictionary definitions are historically the arbitrator of what words mean, but this primacy has come under threat by recent progress in NLP, including word embeddings and generative models like ChatGPT.

Word Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.