Search Results for author: Sarwan Ali

Found 37 papers, 9 papers with code

Expanding Chemical Representation with k-mers and Fragment-based Fingerprints for Molecular Fingerprinting

no code implementations28 Mar 2024 Sarwan Ali, Prakash Chourasia, Murray Patterson

This study introduces a novel approach, combining substruct counting, $k$-mers, and Daylight-like fingerprints, to expand the representation of chemical structures in SMILES strings.

Drug Discovery

A Universal Non-Parametric Approach For Improved Molecular Sequence Analysis

no code implementations12 Feb 2024 Sarwan Ali, Tamkanat E Ali, Prakash Chourasia, Murray Patterson

In this work, we present a novel approach based on the compression-based Model, motivated from \cite{jiang2023low}, which combines the simplicity of basic compression algorithms like Gzip and Bz2, with Normalized Compression Distance (NCD) algorithm to achieve better performance on classification tasks without relying on handcrafted features or pre-trained models.

A Memetic Algorithm To Find a Hamiltonian Cycle in a Hamiltonian Graph

no code implementations1 Feb 2024 Sarwan Ali, Pablo Moscato

We present a memetic algorithm (\maa) approach for finding a Hamiltonian cycle in a Hamiltonian graph.

Beyond Accuracy: Measuring Representation Capacity of Embeddings to Preserve Structural and Contextual Information

no code implementations20 Sep 2023 Sarwan Ali

By combining extrinsic evaluation methods, such as classification and clustering, with t-SNE-based neighborhood analysis, such as neighborhood agreement and trustworthiness, we provide a comprehensive assessment of the representation capacity.

Bayesian Optimization Clustering

Sequence-Based Nanobody-Antigen Binding Prediction

no code implementations15 Jul 2023 Usama Sardar, Sarwan Ali, Muhammad Sohaib Ayub, Muhammad Shoaib, Khurram Bashir, Imdad Ullah Khan, Murray Patterson

We curated a comprehensive dataset of Nanobody-Antigen binding and nonbinding data and devised an embedding method based on gapped k-mers to predict binding based only on sequences of nanobody and antigen.

CAMP: A Context-Aware Cricket Players Performance Metric

1 code implementation14 Jul 2023 Muhammad Sohaib Ayub, Naimat Ullah, Sarwan Ali, Imdad Ullah Khan, Mian Muhammad Awais, Muhammad Asad Khan, Safiullah Faizullah

We propose Context-Aware Metric of player Performance, CAMP, to quantify individual players' contributions toward a cricket match outcome.

Decision Making

Robust Brain Age Estimation via Regression Models and MRI-derived Features

no code implementations8 Jun 2023 Mansoor Ahmed, Usama Sardar, Sarwan Ali, Shafiq Alam, Murray Patterson, Imdad Ullah Khan

The proposed BAE framework provides a new approach for estimating brain age, which has important implications for the understanding of neurological disorders and age-related brain changes.

Age Estimation regression

T Cell Receptor Protein Sequences and Sparse Coding: A Novel Approach to Cancer Classification

1 code implementation25 Apr 2023 Zahra Tayebi, Sarwan Ali, Prakash Chourasia, Taslim Murad, Murray Patterson

Sparse coding is a popular technique in machine learning that enables the representation of data with a set of informative features and can capture complex relationships between amino acids and identify subtle patterns in the sequence that might be missed by low-dimensional methods.

Multi-class Classification Specificity

Virus2Vec: Viral Sequence Classification Using Machine Learning

no code implementations24 Apr 2023 Sarwan Ali, Babatunde Bello, Prakash Chourasia, Ria Thazhe Punathil, Pin-Yu Chen, Imdad Ullah Khan, Murray Patterson

Understanding the host-specificity of different families of viruses sheds light on the origin of, e. g., SARS-CoV-2, rabies, and other such zoonotic pathogens in humans.

Classification Specificity

PCD2Vec: A Poisson Correction Distance-Based Approach for Viral Host Classification

no code implementations13 Apr 2023 Sarwan Ali, Taslim Murad, Murray Patterson

Therefore, the usage of only the spike protein, instead of the full genome, provides most of the essential information for performing analyses such as host classification.

Specificity

ViralVectors: Compact and Scalable Alignment-free Virome Feature Generation

1 code implementation6 Apr 2023 Sarwan Ali, Prakash Chourasia, Zahra Tayebi, Babatunde Bello, Murray Patterson

In this work, we propose \emph{ViralVectors}, a compact feature vector generation from virome sequencing data that allows effective downstream analysis.

4k Decision Making

BioSequence2Vec: Efficient Embedding Generation For Biological Sequences

no code implementations1 Apr 2023 Sarwan Ali, Usama Sardar, Murray Patterson, Imdad Ullah Khan

Kernel-based methods, e. g., SVM, are a proven efficient and useful alternative for several machine learning (ML) tasks such as sequence classification.

Representation Learning

Exploring The Potential Of GANs In Biological Sequence Analysis

no code implementations4 Mar 2023 Taslim Murad, Sarwan Ali, Murray Patterson

New tools for biological sequence analysis are provided by machine learning (ML) technologies to effectively analyze the functions and structures of the sequences.

Anderson Acceleration For Bioinformatics-Based Machine Learning

no code implementations1 Feb 2023 Sarwan Ali, Prakash Chourasia, Murray Patterson

Anderson acceleration (AA) is a well-known method for accelerating the convergence of iterative algorithms, with applications in various fields including deep learning and optimization.

Evaluating COVID-19 Sequence Data Using Nearest-Neighbors Based Network Model

no code implementations19 Nov 2022 Sarwan Ali

Similarly, euclidean space is not considered the best choice when working with the classification and clustering tasks for biological sequences.

Clustering Graph Mining

Informative Initialization and Kernel Selection Improves t-SNE for Biological Sequences

1 code implementation16 Nov 2022 Prakash Chourasia, Sarwan Ali, Murray Patterson

We show that by using different techniques, such as informed initialization and kernel matrix selection, that t-SNE performs significantly better.

Reads2Vec: Efficient Embedding of Raw High-Throughput Sequencing Reads Data

no code implementations15 Nov 2022 Prakash Chourasia, Sarwan Ali, Simone Ciccolella, Gianluca Della Vedova, Murray Patterson

As a result, new methods such as Pangolin, which can scale to the millions of samples of SARS-CoV-2 currently available, have appeared.

Clustering Vocal Bursts Intensity Prediction

Impact Of Missing Data Imputation On The Fairness And Accuracy Of Graph Node Classifiers

1 code implementation1 Nov 2022 Haris Mansoor, Sarwan Ali, Shafiq Alam, Muhammad Asad Khan, Umair ul Hassan, Imdadullah Khan

In this paper, we analyze the effect on fairness in the context of graph data (node attributes) imputation using different embedding and neural network methods.

Fairness Imputation +1

Efficient Approximate Kernel Based Spike Sequence Classification

no code implementations11 Sep 2022 Sarwan Ali, Bikram Sahoo, Muhammad Asad Khan, Alexander Zelikovsky, Imdad Ullah Khan, Murray Patterson

More specifically, we improve the quality of the approximate kernel using domain knowledge (computed using information gain) and efficient preprocessing (using minimizers computation) to classify coronavirus spike protein sequences corresponding to different variants (e. g., Alpha, Beta, Gamma).

Classification Clustering

Information We Can Extract About a User From 'One Minute Mobile Application Usage'

no code implementations27 Jul 2022 Sarwan Ali

Since smartphones are easily available to every human being in the modern world, using them to track the human activities becomes possible.

Activity Recognition

Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence Classification

1 code implementation18 Jul 2022 Sarwan Ali, Bikram Sahoo, Alexander Zelikovskiy, Pin-Yu Chen, Murray Patterson

The rapid spread of the COVID-19 pandemic has resulted in an unprecedented amount of sequence data of the SARS-CoV-2 genome -- millions of sequences and counting.

Benchmarking BIG-bench Machine Learning +1

PWM2Vec: An Efficient Embedding Approach for Viral Host Specification from Coronavirus Spike Sequences

no code implementations6 Jan 2022 Sarwan Ali, Babatunde Bello, Prakash Chourasia, Ria Thazhe Punathil, Yijing Zhou, Murray Patterson

In coronaviruses, the surface (S) protein, or spike protein, is an important part of determining host specificity since it is the point of contact between the virus and the host cell membrane.

Open-Ended Question Answering Specificity

Efficient Analysis of COVID-19 Clinical Data using Machine Learning Models

no code implementations18 Oct 2021 Sarwan Ali, Yijing Zhou, Murray Patterson

Applying machine learning based algorithms to this big data is a natural approach to take to this aim, since they can quickly scale to such data, and extract the relevant information in the presence of variety and different levels of veracity.

BIG-bench Machine Learning feature selection

Robust Representation and Efficient Feature Selection Allows for Effective Clustering of SARS-CoV-2 Variants

1 code implementation18 Oct 2021 Zahra Tayebi, Sarwan Ali, Murray Patterson

We then show that with the appropriate feature selection, we can efficiently and effectively cluster the spike sequences based on the different variants.

Clustering feature selection

Characterizing SARS-CoV-2 Spike Sequences Based on Geographical Location

1 code implementation2 Oct 2021 Sarwan Ali, Babatunde Bello, Zahra Tayebi, Murray Patterson

With the rapid spread of COVID-19 worldwide, viral genomic data is available in the order of millions of sequences on public databases such as GISAID.

Benchmarking Machine Learning Robustness in Covid-19 Spike Sequence Classification

no code implementations29 Sep 2021 Sarwan Ali, Bikram Sahoo, Pin-Yu Chen, Murray Patterson

The rapid spread of the COVID-19 pandemic has resulted in an unprecedented amount of sequence data of the SARS-CoV-2 viral genome --- millions of sequences and counting.

Benchmarking BIG-bench Machine Learning +1

Locally Weighted Mean Phase Angle (LWMPA) Based Tone Mapping Quality Index (TMQI-3)

no code implementations17 Sep 2021 Inaam Ul Hassan, Abdul Haseeb, Sarwan Ali

An HDR image comprises multiple narrow-range-exposure images combined into one high-quality image.

Tone Mapping

Spike2Vec: An Efficient and Scalable Embedding Approach for COVID-19 Spike Sequences

1 code implementation12 Sep 2021 Sarwan Ali, Murray Patterson

Through experiments, we show that Spike2Vec is not only scalable on several million spike sequences, but also outperforms the baseline models in terms of prediction accuracy, F1 score, etc.

Computing Graph Descriptors on Edge Streams

no code implementations2 Sep 2021 Zohair Raza Hassan, Sarwan Ali, Imdadullah Khan, Mudassir Shabbir, Waseem Abbas

Operating on edge streams allows us to avoid storing the entire graph in memory, and controlling the sample size enables us to keep the runtime of our algorithms within desired bounds.

Anomaly Detection Classification

Effective and scalable clustering of SARS-CoV-2 sequences

no code implementations18 Aug 2021 Sarwan Ali, Tamkanat-E-Ali, Muhammad Asad Khan, Imdadullah Khan, Murray Patterson

Using a $k$-mer based feature vector generation and efficient feature selection methods, our approach is effective in identifying variants, as well as being efficient and scalable to millions of sequences.

Clustering feature selection

A k-mer Based Approach for SARS-CoV-2 Variant Identification

no code implementations7 Aug 2021 Sarwan Ali, Bikram Sahoo, Naimat Ullah, Alexander Zelikovskiy, Murray Patterson, Imdadullah Khan

With the rapid spread of the novel coronavirus (COVID-19) across the globe and its continuous mutation, it is of pivotal importance to design a system to identify different known (and unknown) variants of SARS-CoV-2.

Effect of Analysis Window and Feature Selection on Classification of Hand Movements Using EMG Signal

no code implementations2 Feb 2020 Asad Ullah, Sarwan Ali, Imdadullah Khan, Muhammad Asad Khan, Safiullah Faizullah

In this paper, we investigate the effect of the analysis window and feature selection on classification accuracy of different hand and wrist movements using time-domain features.

BIG-bench Machine Learning Classification +3

Short-Term Load Forecasting Using AMI Data

no code implementations28 Dec 2019 Haris Mansoor, Sarwan Ali, Imdadullah Khan, Naveed Arshad, Muhammad Asad Khan, Safiullah Faizullah

A prominent feature of \textsc{fmf} is that it works at any level of user-specified granularity, both in the temporal (from a single hour to days) and spatial dimensions (a single household to groups of consumers).

Load Forecasting

Predicting Attributes of Nodes Using Network Structure

no code implementations27 Dec 2019 Sarwan Ali, Muhammad Haroon Shakeel, Imdadullah Khan, Safiullah Faizullah, Muhammad Asad Khan

Predicting node attributes in such graphs is an important problem with applications in many domains like recommendation systems, privacy preservation, and targeted advertisement.

Attribute Recommendation Systems

Cannot find the paper you are looking for? You can Submit a new open access paper.