Search Results for author: Murray Patterson

Found 24 papers, 7 papers with code

Expanding Chemical Representation with k-mers and Fragment-based Fingerprints for Molecular Fingerprinting

no code implementations • 28 Mar 2024 • Sarwan Ali, Prakash Chourasia, Murray Patterson

This study introduces a novel approach, combining substruct counting, $k$-mers, and Daylight-like fingerprints, to expand the representation of chemical structures in SMILES strings.

Drug Discovery

Paper
Add Code

A Universal Non-Parametric Approach For Improved Molecular Sequence Analysis

no code implementations • 12 Feb 2024 • Sarwan Ali, Tamkanat E Ali, Prakash Chourasia, Murray Patterson

In this work, we present a novel approach based on the compression-based Model, motivated from \cite{jiang2023low}, which combines the simplicity of basic compression algorithms like Gzip and Bz2, with Normalized Compression Distance (NCD) algorithm to achieve better performance on classification tasks without relying on handcrafted features or pre-trained models.

Paper
Add Code

Sequence-Based Nanobody-Antigen Binding Prediction

no code implementations • 15 Jul 2023 • Usama Sardar, Sarwan Ali, Muhammad Sohaib Ayub, Muhammad Shoaib, Khurram Bashir, Imdad Ullah Khan, Murray Patterson

We curated a comprehensive dataset of Nanobody-Antigen binding and nonbinding data and devised an embedding method based on gapped k-mers to predict binding based only on sequences of nanobody and antigen.

Paper
Add Code

Robust Brain Age Estimation via Regression Models and MRI-derived Features

no code implementations • 8 Jun 2023 • Mansoor Ahmed, Usama Sardar, Sarwan Ali, Shafiq Alam, Murray Patterson, Imdad Ullah Khan

The proposed BAE framework provides a new approach for estimating brain age, which has important implications for the understanding of neurological disorders and age-related brain changes.

Age Estimation regression

Paper
Add Code

T Cell Receptor Protein Sequences and Sparse Coding: A Novel Approach to Cancer Classification

1 code implementation • 25 Apr 2023 • Zahra Tayebi, Sarwan Ali, Prakash Chourasia, Taslim Murad, Murray Patterson

Sparse coding is a popular technique in machine learning that enables the representation of data with a set of informative features and can capture complex relationships between amino acids and identify subtle patterns in the sequence that might be missed by low-dimensional methods.

Multi-class Classification Specificity

Paper
Code

Virus2Vec: Viral Sequence Classification Using Machine Learning

no code implementations • 24 Apr 2023 • Sarwan Ali, Babatunde Bello, Prakash Chourasia, Ria Thazhe Punathil, Pin-Yu Chen, Imdad Ullah Khan, Murray Patterson

Understanding the host-specificity of different families of viruses sheds light on the origin of, e. g., SARS-CoV-2, rabies, and other such zoonotic pathogens in humans.

Classification Specificity

Paper
Add Code

PCD2Vec: A Poisson Correction Distance-Based Approach for Viral Host Classification

no code implementations • 13 Apr 2023 • Sarwan Ali, Taslim Murad, Murray Patterson

Therefore, the usage of only the spike protein, instead of the full genome, provides most of the essential information for performing analyses such as host classification.

Specificity

Paper
Add Code

ViralVectors: Compact and Scalable Alignment-free Virome Feature Generation

1 code implementation • 6 Apr 2023 • Sarwan Ali, Prakash Chourasia, Zahra Tayebi, Babatunde Bello, Murray Patterson

In this work, we propose \emph{ViralVectors}, a compact feature vector generation from virome sequencing data that allows effective downstream analysis.

4k Decision Making

21,567

Paper
Code

BioSequence2Vec: Efficient Embedding Generation For Biological Sequences

no code implementations • 1 Apr 2023 • Sarwan Ali, Usama Sardar, Murray Patterson, Imdad Ullah Khan

Kernel-based methods, e. g., SVM, are a proven efficient and useful alternative for several machine learning (ML) tasks such as sequence classification.

Representation Learning

Paper
Add Code

Exploring The Potential Of GANs In Biological Sequence Analysis

no code implementations • 4 Mar 2023 • Taslim Murad, Sarwan Ali, Murray Patterson

New tools for biological sequence analysis are provided by machine learning (ML) technologies to effectively analyze the functions and structures of the sequences.

Paper
Add Code

Efficient Classification of SARS-CoV-2 Spike Sequences Using Federated Learning

no code implementations • 17 Feb 2023 • Prakash Chourasia, Taslim Murad, Zahra Tayebi, Sarwan Ali, Imdad Ullah Khan, Murray Patterson

This paper presents a federated learning (FL) approach to train an AI model for SARS-Cov-2 variant classification.

Federated Learning Privacy Preserving

Paper
Add Code

Anderson Acceleration For Bioinformatics-Based Machine Learning

no code implementations • 1 Feb 2023 • Sarwan Ali, Prakash Chourasia, Murray Patterson

Anderson acceleration (AA) is a well-known method for accelerating the convergence of iterative algorithms, with applications in various fields including deep learning and optimization.

Paper
Add Code

Informative Initialization and Kernel Selection Improves t-SNE for Biological Sequences

1 code implementation • 16 Nov 2022 • Prakash Chourasia, Sarwan Ali, Murray Patterson

We show that by using different techniques, such as informed initialization and kernel matrix selection, that t-SNE performs significantly better.

Paper
Code

Reads2Vec: Efficient Embedding of Raw High-Throughput Sequencing Reads Data

no code implementations • 15 Nov 2022 • Prakash Chourasia, Sarwan Ali, Simone Ciccolella, Gianluca Della Vedova, Murray Patterson

As a result, new methods such as Pangolin, which can scale to the millions of samples of SARS-CoV-2 currently available, have appeared.

Clustering Vocal Bursts Intensity Prediction

Paper
Add Code

Efficient Approximate Kernel Based Spike Sequence Classification

no code implementations • 11 Sep 2022 • Sarwan Ali, Bikram Sahoo, Muhammad Asad Khan, Alexander Zelikovsky, Imdad Ullah Khan, Murray Patterson

More specifically, we improve the quality of the approximate kernel using domain knowledge (computed using information gain) and efficient preprocessing (using minimizers computation) to classify coronavirus spike protein sequences corresponding to different variants (e. g., Alpha, Beta, Gamma).

Classification Clustering

Paper
Add Code

Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence Classification

1 code implementation • 18 Jul 2022 • Sarwan Ali, Bikram Sahoo, Alexander Zelikovskiy, Pin-Yu Chen, Murray Patterson

The rapid spread of the COVID-19 pandemic has resulted in an unprecedented amount of sequence data of the SARS-CoV-2 genome -- millions of sequences and counting.

Benchmarking BIG-bench Machine Learning +1

Paper
Code

PWM2Vec: An Efficient Embedding Approach for Viral Host Specification from Coronavirus Spike Sequences

no code implementations • 6 Jan 2022 • Sarwan Ali, Babatunde Bello, Prakash Chourasia, Ria Thazhe Punathil, Yijing Zhou, Murray Patterson

In coronaviruses, the surface (S) protein, or spike protein, is an important part of determining host specificity since it is the point of contact between the virus and the host cell membrane.

Open-Ended Question Answering Specificity

Paper
Add Code

Robust Representation and Efficient Feature Selection Allows for Effective Clustering of SARS-CoV-2 Variants

1 code implementation • 18 Oct 2021 • Zahra Tayebi, Sarwan Ali, Murray Patterson

We then show that with the appropriate feature selection, we can efficiently and effectively cluster the spike sequences based on the different variants.

Clustering feature selection

160

Paper
Code

Efficient Analysis of COVID-19 Clinical Data using Machine Learning Models

no code implementations • 18 Oct 2021 • Sarwan Ali, Yijing Zhou, Murray Patterson

Applying machine learning based algorithms to this big data is a natural approach to take to this aim, since they can quickly scale to such data, and extract the relevant information in the presence of variety and different levels of veracity.

BIG-bench Machine Learning feature selection

Paper
Add Code

Characterizing SARS-CoV-2 Spike Sequences Based on Geographical Location

1 code implementation • 2 Oct 2021 • Sarwan Ali, Babatunde Bello, Zahra Tayebi, Murray Patterson

With the rapid spread of COVID-19 worldwide, viral genomic data is available in the order of millions of sequences on public databases such as GISAID.

Paper
Code

Benchmarking Machine Learning Robustness in Covid-19 Spike Sequence Classification

no code implementations • 29 Sep 2021 • Sarwan Ali, Bikram Sahoo, Pin-Yu Chen, Murray Patterson

The rapid spread of the COVID-19 pandemic has resulted in an unprecedented amount of sequence data of the SARS-CoV-2 viral genome --- millions of sequences and counting.

Benchmarking BIG-bench Machine Learning +1

Paper
Add Code

Spike2Vec: An Efficient and Scalable Embedding Approach for COVID-19 Spike Sequences

1 code implementation • 12 Sep 2021 • Sarwan Ali, Murray Patterson

Through experiments, we show that Spike2Vec is not only scalable on several million spike sequences, but also outperforms the baseline models in terms of prediction accuracy, F1 score, etc.

Paper
Code

Effective and scalable clustering of SARS-CoV-2 sequences

no code implementations • 18 Aug 2021 • Sarwan Ali, Tamkanat-E-Ali, Muhammad Asad Khan, Imdadullah Khan, Murray Patterson

Using a $k$-mer based feature vector generation and efficient feature selection methods, our approach is effective in identifying variants, as well as being efficient and scalable to millions of sequences.

Clustering feature selection

Paper
Add Code

A k-mer Based Approach for SARS-CoV-2 Variant Identification

no code implementations • 7 Aug 2021 • Sarwan Ali, Bikram Sahoo, Naimat Ullah, Alexander Zelikovskiy, Murray Patterson, Imdadullah Khan

With the rapid spread of the novel coronavirus (COVID-19) across the globe and its continuous mutation, it is of pivotal importance to design a system to identify different known (and unknown) variants of SARS-CoV-2.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.