This is the place for everything from before, during, and after IGDOREs ReproducibiliTea sessions, including (but not limited to) questions, networking, tips and recommendations, and continued arguments and discussions.
Welcome!
This is the place for everything from before, during, and after IGDOREs ReproducibiliTea sessions, including (but not limited to) questions, networking, tips and recommendations, and continued arguments and discussions.
Welcome!
Thank you for a great session today, guys!
I somehow didn’t have time to introduce next session’s article, but I suppose that’s what happens when you’re busy having fun discussions
Next session (2025-01-22T14:15:00Z), we’re discussing:
Smaldino, P. E., & McElreath, R. (2016) The Natural Selection of Bad Science
Available here: https://royalsocietypublishing.org/doi/10.1098/rsos.160384#
See you in two weeks!
Thank you @NHyltse for the warm welcome and moderation! Thanks goes to @rebecca and IGDORE global board for starting this initiative! I was very happy to e-meet you! Some ressources I have mentioned during our meeting:
SPEAKER: Nigreisy MONTALVO ZULUETA INSTITUTION: INSTITUT IMAGINE TITLE: Pathogenicity scoring of genetic variants through federated learning across independent institutions reaches comparable or superior performance than their centralized-data model counterparts AUTHORS (1) Nigreisy Montalvo, (1) Antonio Rausell, (2) Francisco Requena (1) Clinical Bioinformatics Laboratory. Institut Imagine (2) Weill Cornell Medicine. Institute for Computational Biomedicine ABSTRACT Machine Learning (ML) has emerged as a popular approach for the pathogenicity scoring of human genetic variants. However, existing ML methods are often trained on small cohort data from a single institution, resulting in poor accuracy when tested on external cohorts. Multi-institutional collaboration can increase data size and diversity, leading to more robust models. Yet, the centralisation of genetic data raises important concerns on data privacy and security. Federated Learning (FL) is a ML technique that allows multiple institutions to collaboratively train a model, without raw data exposure. Despite enabling privacy-preserving collaborations, studies on FL for the pathogenicity prediction of genetic variants are currently lacking.In this work, we present a simulated FLstudy on the clinical interpretation of deletion. Number Variations (CNVs), and coding and non-coding Single Nucleotide Variants (SNVs) on ClinVar database. We show that federated models systematically outperform single-site models, and achieve similar or better performance than traditional centralised learning. In addition, we evidence that federated models exhibit more robustness than cen tralised models when an institution decides not to participate in the training. With our findings, we expect to incentive the adoption of FL for establishing secure multi-institutional collaborations in human variant interpretation.
KEYWORDS: Federated Learning, human genetic variants, privacy-preserving machine learning
Thanks a lot, @sivashchenko!
Thank you for a great discussion, everyone who showed up today!
Links and mentions in the Zoom chat:
Other things I remember we mentioned:
Please add more things in the comments if I forgot something!
Next time (2025-02-05T14:15:00Z), we’ll be discussing:
Albanese, F., Bloem, B. R., & Kalia, L. V. (2023)
Addressing the “Replication Crisis” in the Field of Parkinson’s Disease https://doi.org/10.3233/JPD-239002
Hope to see you there!
Thanks for the warm welcome and moderation, Natalie! On the topic of registered reports, the professor whose name I couldn’t bring to mind is Chris Chambers who is currently working at Cardiff University. I misspoke in saying that his students’ work is all done as registered reports, at least I don’t see any RR recommendations for their work posted on the PCI Registered Reports site. His students’ projects are focused on the adoption and effects of RR’s, however. e.g. past doctoral student’s Dissertation, current student profile.
Simone asked how long the process of a RR takes. I looked into it, and here’s the gist from what I found:
On the PCI RR author guide they describe timelines for two kinds of review (section 2.20)
As for whether it is a suitable style for PhD projects, Chambers’ recent talks report that PhD students make up the largest group of Registered Reports submitters (47%).
More resources for those interested from a talk given a year ago:
[Edit 3/30, I mistakenly described Aoife O’Mahony’s Thesis as a MSc Thesis, but it is in fact a PhD Dissertation]
It was a real pleasure taking part in the ReproducibiliTea journal club. Thank you, Natalie, for facilitating. Simone and I would like to keep club going and we’re wondering if others would like to join us at the same time (biweekly Wednesdays 15:15 CEST, starting April 16th)? We’d start with the paper “Reproducibility vs Replicability: A Brief History of a Confused Terminology” (Plesser, 2018). Please reach out if you are interested!
Also, since @alexbyrnes and others asked to get a summary of the 3/5 meeting where @rickcarlsson, Zia and I discussed “Open Code for Open Science?”(2014) by Steve Easterbrook, I’ve posted a summary of the paper and our discussion on my website here.
Awesome initiative, @Christian_N_Sodano! Very happy to hear that!
Will you also continue with the Open Science Coffee at the end of each session? If so, I look forward to seeing you there!
And please let me or @Gavin know if you don’t already have access to everything you need (eg email addresses to participants, Zoom-account) to coordinate and host.
Just found an old paper by @Daniel-Mietchen which seems relevant to this thread: