Search Results for author: Murray Patterson

Found 24 papers, 7 papers with code

Expanding Chemical Representation with k-mers and Fragment-based Fingerprints for Molecular Fingerprinting

no code implementations28 Mar 2024 Sarwan Ali, Prakash Chourasia, Murray Patterson

This study introduces a novel approach, combining substruct counting, $k$-mers, and Daylight-like fingerprints, to expand the representation of chemical structures in SMILES strings.

Drug Discovery

A Universal Non-Parametric Approach For Improved Molecular Sequence Analysis

no code implementations12 Feb 2024 Sarwan Ali, Tamkanat E Ali, Prakash Chourasia, Murray Patterson

In this work, we present a novel approach based on the compression-based Model, motivated from \cite{jiang2023low}, which combines the simplicity of basic compression algorithms like Gzip and Bz2, with Normalized Compression Distance (NCD) algorithm to achieve better performance on classification tasks without relying on handcrafted features or pre-trained models.

Sequence-Based Nanobody-Antigen Binding Prediction

no code implementations15 Jul 2023 Usama Sardar, Sarwan Ali, Muhammad Sohaib Ayub, Muhammad Shoaib, Khurram Bashir, Imdad Ullah Khan, Murray Patterson

We curated a comprehensive dataset of Nanobody-Antigen binding and nonbinding data and devised an embedding method based on gapped k-mers to predict binding based only on sequences of nanobody and antigen.

Robust Brain Age Estimation via Regression Models and MRI-derived Features

no code implementations8 Jun 2023 Mansoor Ahmed, Usama Sardar, Sarwan Ali, Shafiq Alam, Murray Patterson, Imdad Ullah Khan

The proposed BAE framework provides a new approach for estimating brain age, which has important implications for the understanding of neurological disorders and age-related brain changes.

Age Estimation regression

T Cell Receptor Protein Sequences and Sparse Coding: A Novel Approach to Cancer Classification

1 code implementation25 Apr 2023 Zahra Tayebi, Sarwan Ali, Prakash Chourasia, Taslim Murad, Murray Patterson

Sparse coding is a popular technique in machine learning that enables the representation of data with a set of informative features and can capture complex relationships between amino acids and identify subtle patterns in the sequence that might be missed by low-dimensional methods.

Multi-class Classification Specificity

Virus2Vec: Viral Sequence Classification Using Machine Learning

no code implementations24 Apr 2023 Sarwan Ali, Babatunde Bello, Prakash Chourasia, Ria Thazhe Punathil, Pin-Yu Chen, Imdad Ullah Khan, Murray Patterson

Understanding the host-specificity of different families of viruses sheds light on the origin of, e. g., SARS-CoV-2, rabies, and other such zoonotic pathogens in humans.

Classification Specificity

PCD2Vec: A Poisson Correction Distance-Based Approach for Viral Host Classification

no code implementations13 Apr 2023 Sarwan Ali, Taslim Murad, Murray Patterson

Therefore, the usage of only the spike protein, instead of the full genome, provides most of the essential information for performing analyses such as host classification.

Specificity

ViralVectors: Compact and Scalable Alignment-free Virome Feature Generation

1 code implementation6 Apr 2023 Sarwan Ali, Prakash Chourasia, Zahra Tayebi, Babatunde Bello, Murray Patterson

In this work, we propose \emph{ViralVectors}, a compact feature vector generation from virome sequencing data that allows effective downstream analysis.

4k Decision Making

BioSequence2Vec: Efficient Embedding Generation For Biological Sequences

no code implementations1 Apr 2023 Sarwan Ali, Usama Sardar, Murray Patterson, Imdad Ullah Khan

Kernel-based methods, e. g., SVM, are a proven efficient and useful alternative for several machine learning (ML) tasks such as sequence classification.

Representation Learning

Exploring The Potential Of GANs In Biological Sequence Analysis

no code implementations4 Mar 2023 Taslim Murad, Sarwan Ali, Murray Patterson

New tools for biological sequence analysis are provided by machine learning (ML) technologies to effectively analyze the functions and structures of the sequences.

Anderson Acceleration For Bioinformatics-Based Machine Learning

no code implementations1 Feb 2023 Sarwan Ali, Prakash Chourasia, Murray Patterson

Anderson acceleration (AA) is a well-known method for accelerating the convergence of iterative algorithms, with applications in various fields including deep learning and optimization.

Informative Initialization and Kernel Selection Improves t-SNE for Biological Sequences

1 code implementation16 Nov 2022 Prakash Chourasia, Sarwan Ali, Murray Patterson

We show that by using different techniques, such as informed initialization and kernel matrix selection, that t-SNE performs significantly better.

Reads2Vec: Efficient Embedding of Raw High-Throughput Sequencing Reads Data

no code implementations15 Nov 2022 Prakash Chourasia, Sarwan Ali, Simone Ciccolella, Gianluca Della Vedova, Murray Patterson

As a result, new methods such as Pangolin, which can scale to the millions of samples of SARS-CoV-2 currently available, have appeared.

Clustering Vocal Bursts Intensity Prediction

Efficient Approximate Kernel Based Spike Sequence Classification

no code implementations11 Sep 2022 Sarwan Ali, Bikram Sahoo, Muhammad Asad Khan, Alexander Zelikovsky, Imdad Ullah Khan, Murray Patterson

More specifically, we improve the quality of the approximate kernel using domain knowledge (computed using information gain) and efficient preprocessing (using minimizers computation) to classify coronavirus spike protein sequences corresponding to different variants (e. g., Alpha, Beta, Gamma).

Classification Clustering

Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence Classification

1 code implementation18 Jul 2022 Sarwan Ali, Bikram Sahoo, Alexander Zelikovskiy, Pin-Yu Chen, Murray Patterson

The rapid spread of the COVID-19 pandemic has resulted in an unprecedented amount of sequence data of the SARS-CoV-2 genome -- millions of sequences and counting.

Benchmarking BIG-bench Machine Learning +1

PWM2Vec: An Efficient Embedding Approach for Viral Host Specification from Coronavirus Spike Sequences

no code implementations6 Jan 2022 Sarwan Ali, Babatunde Bello, Prakash Chourasia, Ria Thazhe Punathil, Yijing Zhou, Murray Patterson

In coronaviruses, the surface (S) protein, or spike protein, is an important part of determining host specificity since it is the point of contact between the virus and the host cell membrane.

Open-Ended Question Answering Specificity

Robust Representation and Efficient Feature Selection Allows for Effective Clustering of SARS-CoV-2 Variants

1 code implementation18 Oct 2021 Zahra Tayebi, Sarwan Ali, Murray Patterson

We then show that with the appropriate feature selection, we can efficiently and effectively cluster the spike sequences based on the different variants.

Clustering feature selection

Efficient Analysis of COVID-19 Clinical Data using Machine Learning Models

no code implementations18 Oct 2021 Sarwan Ali, Yijing Zhou, Murray Patterson

Applying machine learning based algorithms to this big data is a natural approach to take to this aim, since they can quickly scale to such data, and extract the relevant information in the presence of variety and different levels of veracity.

BIG-bench Machine Learning feature selection

Characterizing SARS-CoV-2 Spike Sequences Based on Geographical Location

1 code implementation2 Oct 2021 Sarwan Ali, Babatunde Bello, Zahra Tayebi, Murray Patterson

With the rapid spread of COVID-19 worldwide, viral genomic data is available in the order of millions of sequences on public databases such as GISAID.

Benchmarking Machine Learning Robustness in Covid-19 Spike Sequence Classification

no code implementations29 Sep 2021 Sarwan Ali, Bikram Sahoo, Pin-Yu Chen, Murray Patterson

The rapid spread of the COVID-19 pandemic has resulted in an unprecedented amount of sequence data of the SARS-CoV-2 viral genome --- millions of sequences and counting.

Benchmarking BIG-bench Machine Learning +1

Spike2Vec: An Efficient and Scalable Embedding Approach for COVID-19 Spike Sequences

1 code implementation12 Sep 2021 Sarwan Ali, Murray Patterson

Through experiments, we show that Spike2Vec is not only scalable on several million spike sequences, but also outperforms the baseline models in terms of prediction accuracy, F1 score, etc.

Effective and scalable clustering of SARS-CoV-2 sequences

no code implementations18 Aug 2021 Sarwan Ali, Tamkanat-E-Ali, Muhammad Asad Khan, Imdadullah Khan, Murray Patterson

Using a $k$-mer based feature vector generation and efficient feature selection methods, our approach is effective in identifying variants, as well as being efficient and scalable to millions of sequences.

Clustering feature selection

A k-mer Based Approach for SARS-CoV-2 Variant Identification

no code implementations7 Aug 2021 Sarwan Ali, Bikram Sahoo, Naimat Ullah, Alexander Zelikovskiy, Murray Patterson, Imdadullah Khan

With the rapid spread of the novel coronavirus (COVID-19) across the globe and its continuous mutation, it is of pivotal importance to design a system to identify different known (and unknown) variants of SARS-CoV-2.

Cannot find the paper you are looking for? You can Submit a new open access paper.