IGDORE ReproducibiliTea Thread

This is the place for everything from before, during, and after IGDOREs ReproducibiliTea sessions, including (but not limited to) questions, networking, tips and recommendations, and continued arguments and discussions.

:sparkles: Welcome! :sparkles:

1 Like

Thank you for a great session today, guys!

I somehow didn’t have time to introduce next session’s article, but I suppose that’s what happens when you’re busy having fun discussions :smile:

Next session (2025-01-22T14:15:00Z), we’re discussing:
Smaldino, P. E., & McElreath, R. (2016) The Natural Selection of Bad Science
Available here: https://royalsocietypublishing.org/doi/10.1098/rsos.160384#

See you in two weeks!

3 Likes

Thank you @NHyltse for the warm welcome and moderation! Thanks goes to @rebecca and IGDORE global board for starting this initiative! I was very happy to e-meet you! Some ressources I have mentioned during our meeting:

SPEAKER: Nigreisy MONTALVO ZULUETA INSTITUTION: INSTITUT IMAGINE TITLE: Pathogenicity scoring of genetic variants through federated learning across independent institutions reaches comparable or superior performance than their centralized-data model counterparts AUTHORS (1) Nigreisy Montalvo, (1) Antonio Rausell, (2) Francisco Requena (1) Clinical Bioinformatics Laboratory. Institut Imagine (2) Weill Cornell Medicine. Institute for Computational Biomedicine ABSTRACT Machine Learning (ML) has emerged as a popular approach for the pathogenicity scoring of human genetic variants. However, existing ML methods are often trained on small cohort data from a single institution, resulting in poor accuracy when tested on external cohorts. Multi-institutional collaboration can increase data size and diversity, leading to more robust models. Yet, the centralisation of genetic data raises important concerns on data privacy and security. Federated Learning (FL) is a ML technique that allows multiple institutions to collaboratively train a model, without raw data exposure. Despite enabling privacy-preserving collaborations, studies on FL for the pathogenicity prediction of genetic variants are currently lacking.In this work, we present a simulated FLstudy on the clinical interpretation of deletion. Number Variations (CNVs), and coding and non-coding Single Nucleotide Variants (SNVs) on ClinVar database. We show that federated models systematically outperform single-site models, and achieve similar or better performance than traditional centralised learning. In addition, we evidence that federated models exhibit more robustness than cen tralised models when an institution decides not to participate in the training. With our findings, we expect to incentive the adoption of FL for establishing secure multi-institutional collaborations in human variant interpretation.

KEYWORDS: Federated Learning, human genetic variants, privacy-preserving machine learning

2 Likes

Thanks a lot, @sivashchenko!