no code implementations • 21 Apr 2023 • Brian Belgodere, Pierre Dognin, Adam Ivankay, Igor Melnyk, Youssef Mroueh, Aleksandra Mojsilovic, Jiri Navratil, Apoorva Nitsure, Inkit Padhi, Mattia Rigotti, Jerret Ross, Yair Schiff, Radhika Vedpathak, Richard A. Young
We present an auditing framework that offers a holistic assessment of synthetic datasets and AI models trained on them, centered around bias and discrimination prevention, fidelity to the real data, utility, robustness, and privacy preservation.
1 code implementation • 18 Nov 2022 • Igor Melnyk, Pierre Dognin, Payel Das
In this work we propose a novel end-to-end multi-stage Knowledge Graph (KG) generation system from textual inputs, separating the overall process into two stages.
no code implementations • 5 Oct 2022 • Igor Melnyk, Vijil Chenthamarakshan, Pin-Yu Chen, Payel Das, Amit Dhurandhar, Inkit Padhi, Devleena Das
We introduce Reprogramming for Protein Sequence Infilling, a framework in which pretrained natural language models are repurposed for protein sequence infilling via reprogramming, to infill protein sequence templates as a method of novel protein generation.
no code implementations • 5 Oct 2022 • Igor Melnyk, Aurelie Lozano, Payel Das, Vijil Chenthamarakshan
In this work, we propose to perform knowledge distillation on the folding model's confidence metrics, e. g., pTM or pLDDT scores, to obtain a smaller, faster and end-to-end differentiable distilled model, which then can be included as part of the structure consistency regularized inverse folding model training.
no code implementations • 13 Aug 2022 • Brian Belgodere, Vijil Chenthamarakshan, Payel Das, Pierre Dognin, Toby Kurien, Igor Melnyk, Youssef Mroueh, Inkit Padhi, Mattia Rigotti, Jarret Ross, Yair Schiff, Richard A. Young
With the prospect of automating a number of chemical tasks with high fidelity, chemical language processing models are emerging at a rapid speed.
no code implementations • 12 Nov 2021 • Igor Melnyk, Payel Das, Vijil Chenthamarakshan, Aurelie Lozano
Here we consider three recently proposed deep generative frameworks for protein design: (AR) the sequence-based autoregressive generative model, (GVP) the precise structure-based graph neural network, and Fold2Seq that leverages a fuzzy and scale-free representation of a three-dimensional fold, while enforcing structure-to-sequence (and vice versa) consistency.
1 code implementation • EMNLP 2021 • Pierre L. Dognin, Inkit Padhi, Igor Melnyk, Payel Das
Automatic construction of relevant Knowledge Bases (KBs) from text, and generation of semantically meaningful text from KBs are both long-standing goals in Machine Learning.
1 code implementation • 24 Jun 2021 • Yue Cao, Payel Das, Vijil Chenthamarakshan, Pin-Yu Chen, Igor Melnyk, Yang shen
Designing novel protein sequences for a desired 3D topological fold is a fundamental yet non-trivial task in protein engineering.
1 code implementation • 21 Dec 2020 • Pierre Dognin, Igor Melnyk, Youssef Mroueh, Inkit Padhi, Mattia Rigotti, Jarret Ross, Yair Schiff, Richard A. Young, Brian Belgodere
Image captioning has recently demonstrated impressive progress largely owing to the introduction of neural network algorithms trained on curated dataset like MS-COCO.
no code implementations • 21 Dec 2020 • Pierre Dognin, Igor Melnyk, Youssef Mroueh, Inkit Padhi, Mattia Rigotti, Jarret Ross, Yair Schiff
Image captioning systems have made substantial progress, largely due to the availability of curated datasets like Microsoft COCO or Vizwiz that have accurate descriptions of their corresponding images.
1 code implementation • 3 Nov 2020 • Inkit Padhi, Yair Schiff, Igor Melnyk, Mattia Rigotti, Youssef Mroueh, Pierre Dognin, Jerret Ross, Ravi Nair, Erik Altman
This results in two architectures for tabular time series: one for learning representations that is analogous to BERT and can be pre-trained end-to-end and used in downstream tasks, and one that is akin to GPT and can be used for generation of realistic synthetic tabular sequences.
no code implementations • EMNLP 2020 • Pierre L. Dognin, Igor Melnyk, Inkit Padhi, Cicero Nogueira dos santos, Payel Das
In this work, we present a dual learning approach for unsupervised text to path and path to text transfers in Commonsense Knowledge Bases (KBs).
1 code implementation • NeurIPS 2020 • N. Joseph Tatro, Pin-Yu Chen, Payel Das, Igor Melnyk, Prasanna Sattigeri, Rongjie Lai
Yet, current curve finding algorithms do not consider the influence of symmetry in the loss surface created by model weight permutations.
no code implementations • 25 Sep 2019 • N. Joseph Tatro, Pin-Yu Chen, Payel Das, Igor Melnyk, Prasanna Sattigeri, Rongjie Lai
Empirically, this initialization is critical for efficiently learning a simple, planar, low-loss curve between networks that successfully generalizes.
no code implementations • ICLR 2019 • Ziv Goldfeld, Ewout van den Berg, Kristjan Greenewald, Brian Kingsbury, Igor Melnyk, Nam Nguyen, Yury Polyanskiy
We then develop a rigorous estimator for I(X;T) in noisy DNNs and observe compression in various models.
no code implementations • ICLR Workshop DeepGenStruct 2019 • Pierre Dognin, Igor Melnyk, Youssef Mroueh, Jarret Ross, Tom Sercu
In this paper we study image captioning as a conditional GAN training, proposing both a context-aware LSTM captioner and co-attentive discriminator, which enforces semantic alignment between images and captions.
1 code implementation • 13 Feb 2019 • Pierre Dognin, Igor Melnyk, Youssef Mroueh, Jerret Ross, Cicero dos Santos, Tom Sercu
In this paper we propose to perform model ensembling in a multiclass or a multilabel learning setting using Wasserstein (W.) barycenters.
no code implementations • 12 Oct 2018 • Ziv Goldfeld, Ewout van den Berg, Kristjan Greenewald, Igor Melnyk, Nam Nguyen, Brian Kingsbury, Yury Polyanskiy
We then develop a rigorous estimator for $I(X;T)$ in noisy DNNs and observe compression in various models.
no code implementations • ACL 2018 • Cicero Nogueira dos Santos, Igor Melnyk, Inkit Padhi
We introduce a new approach to tackle the problem of offensive language in online social media.
no code implementations • 30 Apr 2018 • Pierre L. Dognin, Igor Melnyk, Youssef Mroueh, Jarret Ross, Tom Sercu
When evaluated on OOC and MS-COCO benchmarks, we show that SCST-based training has a strong performance in both semantic score and human evaluation, promising to be a valuable new approach for efficient discrete GAN training.
no code implementations • 22 Feb 2018 • Kyongmin Yeo, Igor Melnyk
It is shown that, when the numerical discretization is used, the function estimation problem can be solved by a multi-label classification problem.
no code implementations • ICLR 2018 • Kyongmin Yeo, Igor Melnyk, Nam Nguyen, Eun Kyung Lee
We propose to tackle a time series regression problem by computing temporal evolution of a probability density function to provide a probabilistic forecast.
no code implementations • 26 Nov 2017 • Igor Melnyk, Cicero Nogueira dos santos, Kahini Wadhawan, Inkit Padhi, Abhishek Kumar
Text attribute transfer using non-parallel data requires methods that can perform disentanglement of content and linguistic attributes.
no code implementations • 10 Sep 2017 • Hardik Goel, Igor Melnyk, Arindam Banerjee
In many multivariate time series modeling problems, there is usually a significant linear dependency component, for which VARs are suitable, and a nonlinear component, for which RNNs are suitable.
no code implementations • 1 Aug 2017 • Ramesh Nallapati, Igor Melnyk, Abhishek Kumar, Bo-Wen Zhou
We present a new topic model that generates documents by sampling a topic for one whole sentence at a time, and generating the words in the sentence using an RNN decoder that is conditioned on the topic of the sentence.
no code implementations • 21 Feb 2016 • Igor Melnyk, Arindam Banerjee, Bryan Matthews, Nikunj Oza
In this context the goal is to detect anomalous flight segments, due to mechanical, environmental, or human factors in order to identifying operationally significant events and provide insights into the flight operations and highlight otherwise unavailable potential safety risks and precursors to accidents.
no code implementations • 21 Feb 2016 • Igor Melnyk, Arindam Banerjee
While considerable advances have been made in estimating high-dimensional structured models from independent data using Lasso-type models, limited progress has been made for settings when the samples are dependent.
no code implementations • 12 Jul 2014 • Igor Melnyk, Arindam Banerjee
Hidden semi-Markov models (HSMMs) are latent variable models which allow latent state persistence and can be viewed as a generalization of the popular hidden Markov models (HMMs).