NEC orchestrating a brighter world
NEC Laboratories Europe

Biomedical AI
Publications

Daniel Rose, Chia-Chien Hung, Marco Lepri, Israa Alqassem, Kiril Gashteovski, Carolin Lawrence: “MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential Diagnosis”, Association for Computational Linguistics (ACL) 2025

Paper Details

Abstract:

Differential Diagnosis (DDx) is a fundamental yet complex aspect of clinical decision-making, in which physicians iteratively refine a ranked list of possible diseases based on symptoms, antecedents, and medical knowledge. While recent advances in large language models (LLMs) have shown promise in supporting DDx, existing approaches face key limitations, including single-dataset evaluations, isolated optimization of components, unrealistic assumptions about complete patient profiles, and single-attempt diagnosis. We introduce a Modular Explainable DDx Agent (MEDDxAgent) framework designed for interactive DDx, where diagnostic reasoning evolves through iterative learning, rather than assuming a complete patient profile is accessible. MEDDxAgent integrates three modular components: (1) an orchestrator (DDxDriver), (2) a history taking simulator, and (3) two specialized agents for knowledge retrieval and diagnosis strategy. To ensure robust evaluation, we introduce a comprehensive DDx benchmark covering respiratory, skin, and rare diseases. We analyze single-turn diagnostic approaches and demonstrate the importance of iterative refinement when patient profiles are not available at the outset. Our broad evaluation demonstrates that MEDDxAgent achieves over 10% accuracy improvements in interactive DDx across both large and small LLMs, while offering critical explainability into its diagnostic reasoning process.

Accepted at: The Association for Computational Linguistics (ACL) 2025

In collaboration with: University of California, Santa Barbara, CAIR – Ss. Cyril and Methodius University of Skopje

Paper link: https://arxiv.org/pdf/2502.19175

Leonardo Castorina, Filippo Grazioli, Pierre Machart, Anja Moesch, Federico Errica: "Assessing the generalization capabilities of TCR binding predictors via peptide distance analysis", PLOS ONE 2025

Paper Details

Abstract:

Understanding the interaction between T Cell Receptors (TCRs) and peptide-bound Major Histocompatibility Complexes (pMHCs) is crucial for comprehending immune responses and developing targeted immunotherapies. While recent machine learning (ML) models show remarkable success in predicting TCR-pMHC binding within training data, these models often fail to generalize to peptides outside their training distributions, raising concerns about their applicability in therapeutic settings. Understanding and improving the generalization of these models is therefore critical to ensure real-world applications. To address this issue, we evaluate the effect of the distance between training and testing peptide distributions on ML model empirical risk assessments, using sequence-based and 3D structure-based distance metrics. In our analysis we use several state-of-the-art models for TCR-peptide binding prediction: Attentive Variational Information Bottleneck (AVIB), NetTCR-2.0 and -2.2, and ERGO II (pre-trained autoencoder) and ERGO II (LSTM). In this work, we introduce a novel approach for assessing the generalization capabilities of TCR binding predictors: the Distance Split (DS) algorithm. The DS algorithm controls the distance between training and testing peptides based on both sequence and structure, allowing for a more nuanced evaluation of model performance. We show that lower 3D shape similarity between training and test peptides is associated with a harder out-of-distribution task definition, which is more interesting when measuring the ability to generalize to unseen peptides. However, we observe the opposite effect when splitting using sequence-based similarity. These findings highlight the importance of using a distance-based splitting approach to benchmark models. This could then be used to estimate a confidence score on predictions on novel and unseen peptides, based on how different they are from the training ones. Additionally, our results may hint that employing 3D shape to complement sequence information could improve the accuracy of TCR-pMHC binding predictors.

Published in: PLOS ONE 2025

In collaboration with: School of Informatics, University of Edinburgh

Paper link: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0324011

Israa Alqassem, Piyush Borole, Ammar Shaker, Ajitha Rajan: "Tracing Pain: Predictive Modeling for Migraine and Headache Triggers" , Proceedings of the IEEE International Conference on Healthcare Informatics (ICHI) 2025

Paper Details

Abstract:

Forecasting migraine attacks before they occur can significantly enhance the quality of life for patients by allowing them to take preemptive measures to mitigate or prevent the onset of symptoms. In this study, we aim to predict migraine attacks by leveraging the largest real-world migraine dataset available to date, collected from a mobile app between 2016 and 2022, it encompasses approximately 43, 000 users and 7 million daily records gathered throughout the entire 6-year period. We introduce TRACE (Temporal Recurrent Autoencoder for Concept Embedding), a novel approach developed to anticipate migraine attacks. On two held-out sets, using 2-day and 3-day lookback data, TRACE successfully predicts approximately 70% of migraine episodes before their onset, outperforming competing methods in terms of sensitivity, i.e., accurately predicting migraine days. TRACE maintains a comparable specificity of around 60%, i.e., accurately predicting no-headache days. To understand the predictive factors driving models performance, we conduct both global and local feature importance analyses. Global importance is assessed using ablation analysis and Random Forest, while local importance is evaluated on a perpatient basis using SHapley Additive exPlanations (SHAP). Our SHAP analysis show that while common migraine triggers exist, individual triggers can vary significantly, highlighting the need for personalized predictive tools.

Presented at: Proceedings of the IEEE International Conference on Healthcare Informatics (ICHI) 2025

In collaboration with: University of Edinburgh

Full paper download: Tracing_Pain_Predictive_Modeling_for_Migraine_and_Headache_Triggers_ICHI_2025.pdf

Jonathan Warrell, Francesco Alesiani, Cameron Smith, Anja Mösch, Martin Renqiang Min: "Discrete-Continuous Variational Optimization with Local Gradients",  NeurIPS 2024 Workshop

Paper Details

Abstract:

Variational optimization (VO) offers a general approach for handling objectives which may involve discontinuities, or whose gradients are difficult to calculate. By introducing a variational distribution over the parameter space, such objectives are smoothed, and rendered amenable to VO methods. Local gradient information, though, may be available in certain problems, which is neglected by such an approach. We therefore consider a general method for incorporating local information via an augmented VO objective function to accelerate convergence and improve accuracy. We show how our augmented objective can be viewed as an instance of multilevel optimization. Finally, we show our method can train a genetic algorithm simulator, using a recursive Wasserstein distance objective.

Presented at: Optimization for ML Workshop NeurIPS 2024

In collaboration with: NEC Laboratories AmericaYale UniversityMassachusetts General Hospital, Harvard Medical School

Paper link: https://openreview.net/pdf?id=MjwQc0jkF3

Martin Renqiang Min, Kazuhide Onoguchi, Tianxiao Li, Daiki Mori, Jonathan Warrell, Pierre Machart, Anja Mösch, Andrea Meiser, Ivy Grace Pait, Ayako Okamura, Daisuke Muraoka, Hirokazu Matsushita, Kaidre Bendjama: "Design of enhanced TCR against cancer antigens using an AI system", Journal for ImmunoTherapy of Cancer 2024

Paper Details

Abstract:

Background Naturally occurring TCR targeting cancer antigens are associated with relatively low affinity comparatively to TCR targeting external pathogens. This might be explained by the proximity of cancer specific sequences to self. Engineering of modified affinity enhanced TCR constitutes a possible solution, however, TCR binding remains challenging to model using structural biology approaches because of the conformational flexibility of the TCR complex. The use of machine learning based methods constitutes a promising approach to design TCR of higher affinity. Herein, we report enhanced affinity TCR sequences against cancer antigens designed using TCRPPO, a proprietary pipeline for TCR sequence optimization.

Methods TCRPPO is a new reinforcement-learning framework based on proximal policy optimization to optimize TCRs through a mutation policy. Briefly after training the system on a series of TCR sequences known to bind a given target, TCRPPO introduces mutations on existing sequence to achieve higher affinity guided by a reward function factoring in affinity of the new sequence and the likelihood for this sequence to be a valid TCRs. To validate our approach, we designed a series of candidate TCR sequences against known clinically relevant cancer antigens (KRAS G12V and MART-1) and evaluated their biological functional potency. To do so, genes encoding variable regions of the original and optimized TCRα and β chains were assembled into plasmid vectors containing a constant region of a TCRα or TCRβ chain. TAP fragments of TCRα and TCRβ together with a NFAT-Luc reporter plasmid were transfected into the ΔTCR Jurkat cell line. The cells were cultured in the presence of antigen presenting cells with or without target peptide, and then the activation of the reporter gene was measured by luciferase assay.

Results Our AI-based TCR engineering approach generated valid enhanced TCR sequences against the selected epitopes. Engineered TCR transfected cells showed higher activity in the functional assay and demonstrated that TCR generated using a mutation policy can achieve higher biological activity than endogenous TCR. Enhanced TCR generated against KRAS G12V and MART-1 are dissimilar from already described TCR.1

Conclusions We successfully engineered TCRs to have better antigen recognition. The enhanced TCRs warrant further characterization to evaluate their therapeutic potential. Beyond this case, our approach constitutes a pipeline that might be applied to other targets for which alternative TCRs are required.

Reference

1. Chen Z, Min MR, Guo H, Cheng C, Clancy T, Ning X. T-Cell receptor optimization with reinforcement learning and mutation polices for precision immunotherapy. In: Tang, H. (eds) Research in Computational Molecular Biology. RECOMB. 2023;Lecture Notes in Computer Science(), vol 13976. Springer, Cham. https://doi.org/10.1007/978-3-031-29119-7_11.

Published in: Journal for ImmunoTherapy of Cancer 2024

In collaboration with: NEC Laboratories America, NEC CorporationAichi Cancer Center Research InstituteNEC Oncolmmunity

Paper link: https://jitc.bmj.com/content/12/Suppl_2/A1371

A. Lalanne, C. Jamet, JP Delord, C. Ottensmeier, C. Le Tourneau, A. Tavernaro, G. Lacoste, B. Bastien, M. Brandely, B. Grellier, E. Quemeneur, Y. Yamashita, K. Onoue, N. Yamagata, Y. Tanaka, K. Onoguchi, I. G. Pait, B. Malone, O. Baker, P. Brattas, M. Gheorghe, R. Stratford, T. Clancy, K. Bendjama, O. Lantz: "Personalized vaccine TG4050 induces polyepitopic immune responses against private neoantigens in resected HPV-negative head and neck cancers", American Association of Cancer Research (AACR) 2024

Paper Details

Abstract:

Background: 

T cells targeting tumor specific mutations drive anti-tumor immune responses. TG4050 is a novel viral-based personalized cancer vaccine, which encodes up to 30 patient- and tumor-specific sequences bearing in-silico predicted class I and class II epitopes. TG4050 may prime an adaptive immune response against tumor antigens and prevent relapse in resected HNSCC.


Methods: 

Eligible patients with completely resected, stage III or IV HPV negative squamous-cell carcinoma of the oral cavity, oropharynx, hypopharynx or larynx were randomized after completion of standard of care adjuvant radio-chemotherapy to receive TG4050 immediately (arm A) or upon relapse (arm B). TG4050 was administered subcutaneously weekly for 6 weeks, then every 3 weeks for a total of 20 doses. Safety and immunogenicity were evaluated. Vaccine response was assessed using two independent assays: ex vivo IFNg ELISPOT against individual vaccine targets and, when feasible tetramer staining for the targeted epitopes. Additionally, in selected patients, TCR sequencing or scRNAseq VDJ analysis of tetramer-sorted cells evaluated the systemic clonal expansions of tumor infiltrating lymphocyte (TILs) or the transcriptome of vaccine-specific lymphocytes after vaccination, respectively.


Results: 

17 patients were randomized in arm A and 16 in arm B. Targets for the design of a vaccine were identified in all patients. All TG4050-related adverse events were mild to moderate and most were injection site reactions. After a median follow-up of 16.2 months, no relapse occurred in arm A while 3 patients relapsed in arm B after 6.2, 8.8, and 18.5 months. Immune response was studied in 18 patients after vaccination with TG4050: 16 as monotherapy and 2 during disease relapse in combination with chemoimmunotherapy. ELISPOT assay evidenced priming of neoantigen-specific CD8+ T cells in 17/18 (94%) patients (16 in arm A and 1 in arm B). T-cell responses were either de novo (undetectable prior to vaccine) (82%) or amplification of pre-existing responses (18%). The median number of vaccine responses was 6 (0 - 19). Cytometric characterization of vaccine specific T-cells indicated an effector memory phenotype. TCR sequencing of blood T cells in 5 patients evidenced expansion of TIL clonotypes. Finally, the most expanded vaccine specific CD8 T cells found in the blood of 2 patients represented oligoclonal expansions expressing an effector phenotype. Thus, both TCR analyses suggest expansion of clinically relevant T cell clones after TG4050 treatment.


Conclusion: 

TG4050 design and manufacturing was feasible in Head and Neck cancers. Vaccination was safe and induced polyclonal immune responses against vaccine targets alone or during concurrent treatment with immunotherapy. The association of vaccine treatment with lower rates of relapse warrants further clinical confirmation in this indication.

Presented at: American Association of Cancer Research (AACR) 2024

In collaboration with: Institut Curie, IUCT OncopoleThe Clatterbridge Cancer Centre NHS Foundation TrustTransgene SANEC Corporation, NEC Oncoimmunity AS

Poster link: https://www.transgene.fr/wp-content/uploads/20240209_AACR2024_TG4050_poster.pdf

Christophe Le Tourneau, Jean-Pierre Delord,  Ana Lalanne, Camille Jamet, Clementine Spring-Giusti, Annette Tavernaro, Berangere Bastien, Maud Brandely-Talbot, Eric Quemeneur, Kousuke Onoue, Yoshiko Yamashita, Naoko Yamagata, Kazuhide Onoguchi, Ivy Grace Pait, Brandon Malone, Oliver Baker, Per Ludvik Brattas, Olivier Lantz, Kaidre Bendjama, Christian Ottensmeier: "Randomized phase I trial of adjuvant individualized TG4050 vaccine in patients with locally advanced resected HPV-negative head and neck squamous cell carcinoma (HNSCC)", Journal for ImmunoTherapy of Cancer 2024

Paper Details

Abstract:

Background T cells targeting tumor specific mutations drive anti-tumor immune responses. TG4050 is a novel viral-based personalized cancer vaccine, encoding up to 30 patient- and tumor-specific sequences bearing in-silico predicted class I and class II epitopes. TG4050 may prime an adaptive immune response against tumor antigens and prevent relapse in patients with locally advanced resected HNSCC. (NCT04183166)

Methods Eligible patients with resected stage III or IV HPV-negative HNSCC were randomized after completion of standard of care adjuvant (chemo)radiotherapy to receive TG4050 immediately (arm A) or upon relapse (arm B). TG4050 was administered subcutaneously weekly for 6 weeks, then every 3 weeks for a total of 20 doses. Safety, efficacy and immunogenicity were evaluated. Longitudinal vaccine response was assessed by tetramer staining against target epitopes. In selected patients we explored tumor specificity and clonal expansion using bulk and single-cell (sc)TCR sequencing.

Results 17 patients were randomized in arm A and 16 in arm B. All TG4050-related adverse events were mild to moderate. Disease-free survival data after a median follow-up of 24 months will be presented. Immune response was assessed after vaccination with TG4050 in 16 patients as monotherapy and in 2 at relapse in combination with chemoimmunotherapy. ELISPOTs evidenced priming of neoantigen-specific T cells in 17/18 (94%) patients. T-cell responses were either de novo (undetectable prior to vaccination) (82%) or amplification of pre-existing responses (18%). The median number of neoantigen responses was 6 (0 – 19). Frequency of tetramer positive CD8+ T cells was evaluated in 9 patients and increased by 100 to 1000-fold as early as 8 days after initiation, reaching a plateau by day 22 and sustained over the follow-up period. Moreover, we identified clones targeting vaccine epitopes by scTCR sequencing of tetramer positive cells (10 tumoral specificities in 5 patients). Using bulk TCR sequencing data in 2 patients, we show that these antigen specific clones were found in TILs at baseline in the tumor but absent in the periphery prior to vaccination and were significantly expanded after treatment.

Conclusions TG4050 design was well tolerated in patients with locally advanced resected HNSCC and induced sustained immune responses. Efficacy data will be presented.

Trial Registration NCT04183166.

Ethics Approval The appropriate local or national ethics body for each participating centre approved the study protocol and all amendments. All participants provided written, informed consent. •France: Name: Comité de Protection des Personnes Ile de France 5 ID: 35725 (EUCT 2023-508561-33) •UK: Name: South Central - Oxford A Research Ethics Committee ID: 19/SC/0500 (EudraCT 2018-003267-58) •US: Name: Mayo Clinic Institutional Review Boards ID: 20-007421.

Published in: Journal for ImmunoTherapy of Cancer 2024

In collaboration with: Institut Curie, Institut Claudius Regaud-OncopoleTransgeneNEC CorporationNEC Oncolmmunity AS, The Clatterbridge Cancer Centre NHS Foundation Trust

Paper link: https://jitc.bmj.com/content/12/Suppl_2/A746

Anja Mösch, Filippo Grazioli, Pierre Machart, Brandon Malone: “NeoAgDT: Optimization of personal neoantigen vaccine composition by digital twin simulation of a cancer cell population”, Bioinformatics 2024

Paper Details

Abstract:

Motivation: Neoantigen vaccines make use of tumor-specific mutations to enable the patient’s immune system to recognize and eliminate cancer. Selecting vaccine elements, however, is a complex task which needs to take into account not only the underlying antigen presentation pathway but also tumor heterogeneity.
Results: Here, we present NeoAgDT, a two-step approach consisting of: (1) simulating individual cancer cells to create a digital twin of the patient’s tumor cell population and (2) optimizing the vaccine composition by integer linear programming based on this digital twin. NeoAgDT shows improved selection of experimentally-validated neoantigens over ranking-based approaches in a study of seven patients.

Published in: Bioinformatics 2024

Paper link: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btae205/7645414

In collaboration with: NEC OncoImmunity AS

Filippo Grazioli, Pierre Machart, Anja Mösch, Kai Li, Leonardo V Castorina, Nico Pfeifer, Martin Renqiang Min: "Attentive Variational Information Bottleneck for TCR–peptide interaction prediction", Bioinformatics 2023

Paper Details

Abstract:

Motivation

We present a multi-sequence generalization of Variational Information Bottleneck and call the resulting model Attentive Variational Information Bottleneck (AVIB). Our AVIB model leverages multi-head self-attention to implicitly approximate a posterior distribution over latent encodings conditioned on multiple input sequences. We apply AVIB to a fundamental immuno-oncology problem: predicting the interactions between T-cell receptors (TCRs) and peptides.

Results

Experimental results on various datasets show that AVIB significantly outperforms state-of-the-art methods for TCR–peptide interaction prediction. Additionally, we show that the latent posterior distribution learned by AVIB is particularly effective for the unsupervised detection of out-of-distribution amino acid sequences.

Availability and implementation

The code and the data used for this study are publicly available at: https://github.com/nec-research/vibtcr.

Supplementary information

Supplementary data are available at Bioinformatics online.

Published in:

In collaboration with: NEC Laboratories AmericaSchool of Informatics - University of EdinburghUniversity of Tübingen

Paper link: https://academic.oup.com/bioinformatics/article/39/1/btac820/6960920

Christian H.H. Ottensmeier, Jean-Pierre Delord, Ana Lalanne, Olivier Lantz, Camille Jamet, Annette TAVERNARO, Maud Brandely-Talbot, Benoît Grellier, Berangere Bastien, Hakim Makhloufi, Thierry Huss, Yoshiko Yamashita, Kousuke Onoue, Naoko Yamagata, Yuki Tanaka, Brandon Malone, Oliver Baker, Eric Quemeneur, Kaidre Bendjama, Christophe Le Tourneau: "Safety and immunogenicity of TG4050: A personalized cancer vaccine in head and neck carcinoma", American Society of Clinical Oncology (ASCO) 2023

Paper Details

Abstract:

Background: 

Despite adjuvant therapy, over 50% of surgically treated head and neck squamous cell carcinoma (HNSCC) patients (pts) experience a recurrence of disease. Systemic stimulation of cellular immunity against tumor mutations using a viral vaccine may be an ideal modality to clear residual cancer cells. For this purpose, we developed a pipeline for the design of TG4050, a personalized cancer vaccine (PCV) using a Modified Vaccinia Ankara (MVA) viral vector. We report here preliminary safety and immunogenicity data from a phase I TG4050 study. 

Methods: 

Surgically resected stage III or IV, HPV negative HNSCC pts were enrolled in the study. pts must have achieved clinical remission after adjuvant chemoradiotherapy. A PCV for each pt was manufactured with up to 30 neoantigens identified using a state-of-the-art machine learning algorithm, from next generation sequencing (NGS) data. Pts randomized to arm A received the PCV after completion of primary treatment. Pts randomized to arm B received the PCV in the event of relapse, in conjunction with second line therapy. The PCV schedule consisted of an induction period of 6 weekly administrations, followed by booster doses once every 3 weeks for up to one year. Immune cells were collected by leukapheresis at baseline and at day 64. Primary endpoint was safety. Secondary endpoints included feasibility, disease free survival and immune response as assessed by ex-vivo IFNg-ELISPOT. 

Results: 

At the time of data cut-off, a total of 31 pts were randomized, 15 in arm A and 16 in arm B. A vaccine was successfully designed for all randomized pts. Pts had no evidence of disease at baseline either at the clinical or molecular level, as assessed by ctDNA assessment. All adverse events (AEs) were mild to moderate and most were injection site reactions. Median follow-up was 9.2 months in arm A vs 7.6 months in arm B. None of the pts in arm A experienced relapse vs. 2 in the arm B. Immune monitoring demonstrated priming of a polyepitopic T cell response against the PCV in 100% of pts in arm A, among pts evaluated to date, with a mean of 9 responses per pt (6-19). Responses were observed regardless of HLA genotype, and without cross-reactivity to the wildtype antigen. Baseline tumor analyses revealed challenging genomic and immune profiles such as low TMB (avg of 3.06 ± 0.86 Mut/Mb), a majority of immune-desert tumors, and a low expression of important immune related factors including PD-L1 (16 pts out of 17 had a negative to moderate PD-L1 expression). 

Conclusions: 

Our preliminary data demonstrate that TG4050 is safe, well tolerated, and capable of inducing T cell responses in cold tumors. In summary viral based, PCVs designed to induce tumor-specific neoantigen may be associated with a safe tolerance and an improved outcome in HNSCC pts. Clinical trial information: NCT04183166.

Presented at: American Society of Clinical Oncology (ASCO) 2023

In collaboration with: The Clatterbridge Cancer Centre NHS Foundation TrustIUCT Oncopole, Institut Curie, Transgene SANEC CorporationNEC Oncoimmunity AS

Paper link: https://meetings.asco.org/abstracts-presentations/218595

Filippo Grazioli, Anja Mösch, Pierre Machart, Kai Li, Israa Alqassem, Timothy J. O’Donnell, Martin Renqiang Min: "On TCR binding predictors failing to generalize to unseen peptides”, Frontiers in Immunology, 2022

Paper Details

Abstract:
Several recent studies investigate TCR-peptide/-pMHC binding prediction using machine or deep learning approaches. Many of these methods achieve impressive results on test sets which include peptide sequences that are also included in the training set. In this work, we investigate how state-of-the-art deep learning models for TCR-peptide/-pMHC binding prediction generalize to unseen peptides. We create a dataset called TChard, which include positive samples from IEDB, VDJdb, McPAS-TCR and the MIRA set, as well as negative samples from both randomization and 10X Genomics assays. We propose the hard split, a simple heuristic for training/test split, which ensures that test samples exclusively present peptides that do not belong to the training set. We investigate the effect of different training/test splitting techniques on the models’ test performance, as well as the effect of training and testing the models using mismatched negative samples generated randomly, in addition to the negative samples derived from assays. Our results show that modern deep learning methods fail to generalize to unseen peptides. We provide an explanation why this happens and verify our hypothesis on the TChard dataset. We then conclude that robust prediction of TCR recognition is still far for being solved.

Published in: Frontiers in Immunology, 2022

Research partners: Icahn School of Medicine at Mount Sinai, NEC Laboratories America

Full paper download: TCR_Binding_Predictors_Failing_to_Generalize_to_Unseen_Peptides.pdf

F. Grazioli, R. Siarheyeu, I. Alqassem, A. Henschel, G. Pileggi, A. Meiser: "Microbiome-based disease prediction with multimodal variational information bottlenecks“, PLOS Computational Biology, April 2022

Paper Details

Abstract:
Scientific research is shedding light on the interaction of the gut microbiome with the human host and on its role in human health. Existing machine learning methods have shown great potential in discriminating healthy from diseased microbiome states. Most of them leverage shotgun metagenomic sequencing to extract gut microbial species-relative abundances or strain-level markers. Each of these gut microbial profiling modalities showed diagnostic potential when tested separately; however, no existing approach combines them in a single predictive framework. Here, we propose the Multimodal Variational Information Bottleneck (MVIB), a novel deep learning model capable of learning a joint representation of multiple heterogeneous data modalities. MVIB achieves competitive classification performance while being faster than existing methods. Additionally, MVIB offers interpretable results. Our model adopts an information theoretic interpretation of deep neural networks and computes a joint stochastic encoding of different input data modalities. We use MVIB to predict whether human hosts are affected by a certain disease by jointly analysing gut microbial species-relative abundances and strain-level markers. MVIB is evaluated on human gut metagenomic samples from 11 publicly available disease cohorts covering 6 different diseases. We achieve high performance (0.80 < ROC AUC < 0.95) on 5 cohorts and at least medium performance on the remaining ones. We adopt a saliency technique to interpret the output of MVIB and identify the most relevant microbial species and strain-level markers to the model’s predictions. We also perform cross-study generalisation experiments, where we train and test MVIB on different cohorts of the same disease, and overall we achieve comparable results to the baseline approach, i.e. the Random Forest. Further, we evaluate our model by adding metabolomic data derived from mass spectrometry as a third input modality. Our method is scalable with respect to input data modalities and has an average training time of < 1.4 seconds. The source code and the datasets used in this work are publicly available.

Published in:  PLOS Computational Biology, April 2022

Paper available at:  https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010050

Full paper download: Microbiome-based_Disease_Prediction_with_Multimodal_Variational_Information_Bottlenecks.pdf

J. Cheng, K. Ritter, K. Bendjama, B.Malone, “BERTMHC: improved MHC–peptide class II interaction prediction with transformer and multiple instance learning”, Bioinformatics 2021

Paper Details

Abstract:

Motivation: Increasingly comprehensive characterization of cancer-associated genetic alterations has paved the way for the development of highly specific therapeutic vaccines. Predicting precisely the binding and presentation of pep-tides to major histocompatibility complex (MHC) alleles is an important step toward such therapies. Recent data suggest that presentation of both class I and II epitopes are critical for the induction of a sustained effective immune response. However, the prediction performance for MHC class II has been limited compared to class I.

Results: We present a transformer neural network model which leverages self-supervised pretraining from a large corpus of protein sequences. We also propose a multiple instance learning (MIL) framework to deconvolve mass spectrometry data where multiple potential MHC alleles may have presented each peptide. We show that pretraining boosted the performance for these tasks. Combining pretraining and the novel MIL approach, our model outperforms state-of-the-art models based on peptide and MHC sequence only for both binding and cell surface presentation predictions.

Availability and implementation: Our source code is available at github.com/s6juncheng/BERTMHC under a noncommercial license. A webserver is available at bertmhc.privacy.nlehd.de

Published in: Bioinformatics

Full paper download: BERTMHC_improved_MHC–peptide_class_II_interaction_prediction.pdf

"Learning Representations of Missing Data using Graph Neural Networks for Predicting Patient Outcomes,"  AAAI Workshop 2021

Paper Details

Abstract
Extracting actionable insight from Electronic Health Records(EHRs) poses several challenges for traditional machinelearning approaches. Patients are often missing data relativeto each other; the data comes in a variety of modalities, suchas multivariate time series, free text, and categorical demo-graphic information; important relationships among patientscan be difficult to detect; and many others. We propose anovel approach to address these first three challenges usinga representation learning scheme based on graph neural net-works. Our proposed approach is competitive with or outper-forms the state of the art for predicting in-hospital mortality(binary classification), the length of hospital visits (regres-sion) and the discharge destination (multiclass classification).

B. Malone, C. Tosch, B. Grellier, K. Onoue, T. Sztyler, K. Ritter, Y. Yamashita, E. Quemeneur, K. Bendjama: "Performance of neoantigen prediction for the design of TG4050, a patient specific neoantigen cancer vaccine", American Association for Cancer Research Annual Meeting AACR, April 2020

Paper Details

Abstract
The development of therapeutic cancer vaccines to immunize against tumor antigens constitutes a promising modality. Mutation associated antigens are considered major targets given their specificity to tumor cells. These mutations are specific to the patients and require a tailor-made vaccine targeting mutations identified in each tumor. Many mutations are identified in the tumoral genome in most patients, but only a small fraction (around 1%) is suitable as vaccine target. Herein, we report data documenting the prediction performance of the algorithm used for the design of TG4050, a clinical stage patient specific viral-based neoantigen vaccine.

We have trained a set of independent machine learning algorithms to score each candidate neoantigen for several steps of the MHC antigen presentation pathway, including MHC binding, intracellular processing, similarity to self, and likelihood to elicit a T-cell response in peptide stimulated ELISPOT. Further, we have developed a novel graph neural network to combine all these scores to predict the likelihood that a neoantigen will elicit a T-cell response while also incorporating patient-specific factors, such as expression level and conservation of the mutation across different clones. To validate the system, we collected samples from 6 patients diagnosed with NSCLC, sequenced healthy and tumor tissue, identified mutations and ranked them using our algorithm; then, to evaluate immunogenicity, we focused our analysis on CD8+ T cell and measured the frequency of IFN γ+ cells against predicted peptides in autologous PBMC. Immunogenicity of peptides was assayed in 5 pools then deconvoluted against individual peptides.

From 3339 to 4782 somatic variants were detected in tumor tissue samples. After applying technical filtering, removing synonymous mutations, and filtering on transcript expression we detected a median of 281 (192-471) expressed tumor mutations resulting in a median of 2767 candidate class I epitopes (1769 - 4573). The model resulted in high accuracy allowing us to identify peptides with pre-existing ex vivo immunogenic responses in 5 out of 6 patients. Immunogenicity of peptide pools was correlated with ranking by the algorithm. Immunogenicity of the 6 top ranking individual epitopes in each patient showed a median of 5 (2-6) immunogenic peptides resulting in a 77% of true positive rate (TP). It should be noted that when no response was detected, it cannot be excluded that a response could be primed by a vaccine. In a similar setting, the netMHC 4.0 algorithm yielded a TP of 30% and only identified 39% of positive calls of our algorithm.

We demonstrate that the prediction algorithm is accurate in identifying immunogenic cancer mutations even among a large set of candidates. Ongoing TG4050 clinical studies (NCT03839524 and NCT04183166) will allow further validation of the antitumor activity of the elicited immune response.

Presented at:           American Association for Cancer Research Annual Meeting AACR, April 2020

Paper available at:   Cancer Res 2020;80(16 Suppl):Abstract nr 4566

 

Brandon Malone, Boris Simovski, Clément Moliné, Jun Cheng, Marius Gheorghe, Hugues Fontenelle, Ioannis Vardaxis, Simen Tennøe, Jenny-Ann Malmberg, Richard Stratford, Trevor Clancy: "Artificial intelligence predicts the immunogenic landscape of SARS-CoV-2: toward universal blueprints for vaccine designs", Scientific Reports 2020

Paper Details

The global population is at present suffering from a pandemic of Coronavirus disease 2019(COVID-19), caused by the novel coronavirus Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2).The goals of this study were to use artificial intelligence (AI) to predict blueprints for designing universal vaccines against SARS-CoV-2, that contain a sufficiently broad repertoire of T-cell epitopes capable of providing coverage and protectionacross the global population. To help achieve these aims, we profiled the entire SARS-CoV-2 proteome across the most frequent 100 HLA-A, HLA-B and HLA-DR alleles in the human population, using host-infected cell surface antigen presentation and immunogenicity predictors from theNEC Immune Profilersuite of tools, and generated comprehensive epitope maps. We then used these epitope maps as input for a MonteCarlo simulation designed to identify statistically significant “epitope hotspot” regions in the virus that are most likely to be immunogenic across a broad spectrum of HLA types. We then removed epitope hotspots that shared significant homology with proteins in the human proteome to reduce the chance of inducing off-target autoimmune responses. We also analyzed the antigen presentation and immunogenic landscape of all the nonsynonymous mutations across 3400 different sequences of the virus, to identify a trend whereby SARS-COV-2 mutations are predicted to have reduced potential to be presented by host-infected cells, and consequently detected by the host immune system. A sequenceconservation analysis then removed epitope hotspots that occurred in less-conserved regions of the viral proteome. Finally, we used a database of the HLA genotypes of approximately 22 000 individuals to develop a “digital twin” type simulation to model how effective different combinations of hotspots would work in a diverse human population, and used the approach to identify an optimal constellation of epitopes hotspots that could provide maximum coverage in the global population.By combining the antigen presentation to the infected-host cell surface and immunogenicity predictions of the NEC Immune Profilerwith a robust Monte Carlo and digital twin simulation, we have managed to profile the entire SARS-CoV-2 proteome and identify a subset of epitope hotspots that could be harnessed in a vaccine formulation to provide a broad coverage across the global population.

Timo Sztyler, Brandon Malone: “Learning Embeddings from a Biomedical Knowledge Graph for Predicting Novel Relations”, GCB2019

Timo Sztyler, Carolin Lawrence, Brandon Malone: “Building a Biomedical Knowledge Graph and Predicting Novel Relations”, AKBC 2019

Alberto García Durán, Mathias Niepert, Brandon Malone: “MULTI-modal Knowledge Graph Completion to Predict Polypharmacy Side Effects”, DILS 2018

Top of this page