1 code implementation • Findings (ACL) 2022 • Naoya Inoue, Charuta Pethe, Allen Kim, Steven Skiena
We address the problem of learning fixed-length vector representations of characters in novels.
1 code implementation • 2 Jun 2023 • Baojian Zhou, Steven Skiena
To better understand the value of optimizing for AUC, we present an efficient algorithm, namely AUC-opt, to find the provably optimal AUC linear classifier in $\mathbb{R}^2$, which runs in $\mathcal{O}(n_+ n_- \log (n_+ n_-))$ where $n_+$ and $n_-$ are the number of positive and negative samples respectively.
no code implementations • 16 Dec 2022 • Giorgian Borca-Tasciuc, Xingzhi Guo, Stanley Bak, Steven Skiena
Machine learning models are increasingly deployed for critical decision-making tasks, making it important to verify that they do not contain gender or racial biases picked up from training data.
no code implementations • 2 Nov 2022 • Xingzhi Guo, Steven Skiena
Word and graph embeddings are widely used in deep learning applications.
no code implementations • 29 Sep 2021 • Xingzhi Guo, Baojian Zhou, Haochen Chen, Sergiy Verstyuk, Steven Skiena
The power of embedding representations is a curious phenomenon.
1 code implementation • EMNLP 2020 • Allen Kim, Charuta Pethe, Steven Skiena
Recognizing the flow of time in a story is a crucial aspect of understanding it.
1 code implementation • EMNLP 2020 • Charuta Pethe, Allen Kim, Steven Skiena
Books are typically segmented into chapters and sections, representing coherent subnarratives and topics.
1 code implementation • 23 Sep 2020 • Baojian Zhou, Yiming Ying, Steven Skiena
The Area Under the ROC Curve (AUC) is a widely used performance measure for imbalanced classification arising from many application domains where high-dimensional sparse data is abundant.
no code implementations • IJCNLP 2019 • Charuta Pethe, Steven Skiena
Indeed, we demonstrate a statistically significant correlation between this score and tweet popularity (likes/replies/retweets) for 13 of the 15 celebrities in our study.
1 code implementation • 30 Aug 2019 • Haochen Chen, Syed Fahad Sultan, Yingtao Tian, Muhao Chen, Steven Skiena
Two key features of FastRP are: 1) it explicitly constructs a node similarity matrix that captures transitive relationships in a graph and normalizes matrix entries based on node degrees; 2) it utilizes very sparse random projection, which is a scalable optimization-free method for dimension reduction.
no code implementations • 12 May 2019 • Junting Ye, Steven Skiena
We find it interesting that adding name embeddings can further improve the performances of models using demographic features, which are traditionally used for lifespan modeling.
no code implementations • 18 Mar 2019 • Junting Ye, Steven Skiena
We base our algorithmic analysis on four properties journalists have established to be associated with reporting quality: peer reputation, reporting bias / breadth, bottomline financial pressure, and popularity.
1 code implementation • 13 Sep 2018 • Haochen Chen, Xiaofei Sun, Yingtao Tian, Bryan Perozzi, Muhao Chen, Steven Skiena
Network embedding methods aim at learning low-dimensional latent representation of nodes in a network.
Social and Information Networks Physics and Society
no code implementations • EMNLP 2018 • Vivek Kulkarni, Junting Ye, Steven Skiena, William Yang Wang
A news article's title, content and link structure often reveal its political ideology.
1 code implementation • CONLL 2019 • Muhao Chen, Yingtao Tian, Haochen Chen, Kai-Wei Chang, Steven Skiena, Carlo Zaniolo
Bilingual word embeddings have been widely used to capture the similarity of lexical semantics in different human languages.
2 code implementations • 8 Aug 2018 • Haochen Chen, Bryan Perozzi, Rami Al-Rfou, Steven Skiena
We further demonstrate the applications of network embeddings, and conclude the survey with future work in this area.
Social and Information Networks
no code implementations • 18 Jun 2018 • Muhao Chen, Yingtao Tian, Kai-Wei Chang, Steven Skiena, Carlo Zaniolo
Since many multilingual KGs also provide literal descriptions of entities, in this paper, we introduce an embedding-based approach which leverages a weakly aligned multilingual KG for semi-supervised cross-lingual learning using entity descriptions.
1 code implementation • ICLR 2018 • Hanjun Dai, Yingtao Tian, Bo Dai, Steven Skiena, Le Song
Deep generative models have been enjoying success in modeling continuous data.
1 code implementation • 25 Aug 2017 • Junting Ye, Shuchu Han, Yifan Hu, Baris Coskun, Meizhu Liu, Hong Qin, Steven Skiena
Through our analysis of 57M contact lists from a major Internet company, we are able to design a fine-grained nationality classifier covering 39 groups representing over 90% of the world population.
3 code implementations • 23 Jun 2017 • Haochen Chen, Bryan Perozzi, Yifan Hu, Steven Skiena
We present HARP, a novel method for learning low dimensional embeddings of a graph's nodes which preserves higher-order structural features.
Social and Information Networks
no code implementations • 22 May 2017 • Vivek Kulkarni, Margaret L. Kern, David Stillwell, Michal Kosinski, Sandra Matz, Lyle Ungar, Steven Skiena, H. Andrew Schwartz
Taking advantage of linguistic information available through Facebook, we study the process of inferring a new set of potential human traits based on unprompted language use.
no code implementations • 24 Apr 2017 • Yanqing Chen, Steven Skiena
Wikipedia is a useful knowledge source that benefits many applications in language processing and knowledge representation.
no code implementations • 21 Nov 2016 • Yanqing Chen, Steven Skiena
Transliterations play an important role in multilingual entity reference resolution, because proper names increasingly travel between languages in news and social media.
no code implementations • 12 May 2016 • Yingtao Tian, Vivek Kulkarni, Bryan Perozzi, Steven Skiena
Do word embeddings converge to learn similar things over different initializations?
2 code implementations • 6 May 2016 • Bryan Perozzi, Vivek Kulkarni, Haochen Chen, Steven Skiena
We present Walklets, a novel approach for learning multiscale representations of vertices in a network.
Social and Information Networks Physics and Society
2 code implementations • 22 Oct 2015 • Vivek Kulkarni, Bryan Perozzi, Steven Skiena
Our analysis of British and American English over a period of 100 years reveals that semantic variation between these dialects is shrinking.
no code implementations • 12 Nov 2014 • Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, Steven Skiena
We propose a new computational approach for tracking and detecting statistically significant linguistic shifts in the meaning and usage of words.
no code implementations • 14 Oct 2014 • Rami Al-Rfou, Vivek Kulkarni, Bryan Perozzi, Steven Skiena
We describe a system that builds Named Entity Recognition (NER) annotators for 40 major languages using Wikipedia and Freebase.
no code implementations • 5 Apr 2014 • Vivek Kulkarni, Rami Al-Rfou', Bryan Perozzi, Steven Skiena
We evaluate the performance of training the model on the GPU and present optimizations that boost the performance on the GPU. One of the key optimizations, we propose increases the performance of a function involved in calculating and updating the gradient by approximately 50 times on the GPU for sufficiently large batch sizes.
14 code implementations • 26 Mar 2014 • Bryan Perozzi, Rami Al-Rfou, Steven Skiena
We present DeepWalk, a novel approach for learning latent representations of vertices in a network.
Ranked #1 on
Link Property Prediction
on ogbl-ddi
no code implementations • 6 Mar 2014 • Bryan Perozzi, Rami Al-Rfou, Vivek Kulkarni, Steven Skiena
Recent advancements in unsupervised feature learning have developed powerful latent representations of words.
no code implementations • WS 2013 • Rami Al-Rfou, Bryan Perozzi, Steven Skiena
We quantitatively demonstrate the utility of our word embeddings by using them as the sole features for training a part of speech tagger for a subset of these languages.
no code implementations • 15 Jan 2013 • Yanqing Chen, Bryan Perozzi, Rami Al-Rfou, Steven Skiena
We seek to better understand the difference in quality of the several publicly released embeddings.
no code implementations • COLING 2012 • Rami Al-Rfou', Steven Skiena
Online content analysis employs algorithmic methods to identify entities in unstructured text.