A large-scale systematic survey reveals recurring molecular features of public antibody responses to SARS-CoV-2
Global research to combat the COVID-19 pandemic has led to the isolation and characterization of thousands of human antibodies to the SARS-CoV-2 spike protein, providing an unprecedented opportunity to study the antibody response to a single antigen. Using the information derived from 88 research publications and 13 patents, we assembled a dataset of ~8,000 human antibodies to the SARS-CoV-2 spike protein from >200 donors. By analyzing immunoglobulin V and D gene usages, complementarity-determining region H3 sequences, and somatic hypermutations, we demonstrated that the common (public) responses to different domains of the spike protein were quite different. We further used these sequences to train a deep-learning model to accurately distinguish between the human antibodies to SARS-CoV-2 spike protein and those to influenza hemagglutinin protein. Overall, this study provides an informative resource for antibody research and enhances our molecular understanding of public antibody responses.
PDF AbstractCode
Datasets
Results from the Paper
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Antibody-antigen binding prediction | Antibody sequences against Sars-Cov-2 and Omicron BA.1 | Transformer | Accuracy (5-fold) | 65.8% | # 3 | |
Recall (5-fold) | 35.3% | # 3 | ||||
Precision (5-fold) | 82.9% | # 1 | ||||
Selectivity (5-fold) | 93.4% | # 1 |