IGDORE ReproducibiliTea Thread

This is the place for everything from before, during, and after IGDOREs ReproducibiliTea sessions, including (but not limited to) questions, networking, tips and recommendations, and continued arguments and discussions.

:sparkles: Welcome! :sparkles:

1 Like

Thank you for a great session today, guys!

I somehow didn’t have time to introduce next session’s article, but I suppose that’s what happens when you’re busy having fun discussions :smile:

Next session (2025-01-22T14:15:00Z), we’re discussing:
Smaldino, P. E., & McElreath, R. (2016) The Natural Selection of Bad Science
Available here: https://royalsocietypublishing.org/doi/10.1098/rsos.160384#

See you in two weeks!

3 Likes

Thank you @NHyltse for the warm welcome and moderation! Thanks goes to @rebecca and IGDORE global board for starting this initiative! I was very happy to e-meet you! Some ressources I have mentioned during our meeting:

SPEAKER: Nigreisy MONTALVO ZULUETA INSTITUTION: INSTITUT IMAGINE TITLE: Pathogenicity scoring of genetic variants through federated learning across independent institutions reaches comparable or superior performance than their centralized-data model counterparts AUTHORS (1) Nigreisy Montalvo, (1) Antonio Rausell, (2) Francisco Requena (1) Clinical Bioinformatics Laboratory. Institut Imagine (2) Weill Cornell Medicine. Institute for Computational Biomedicine ABSTRACT Machine Learning (ML) has emerged as a popular approach for the pathogenicity scoring of human genetic variants. However, existing ML methods are often trained on small cohort data from a single institution, resulting in poor accuracy when tested on external cohorts. Multi-institutional collaboration can increase data size and diversity, leading to more robust models. Yet, the centralisation of genetic data raises important concerns on data privacy and security. Federated Learning (FL) is a ML technique that allows multiple institutions to collaboratively train a model, without raw data exposure. Despite enabling privacy-preserving collaborations, studies on FL for the pathogenicity prediction of genetic variants are currently lacking.In this work, we present a simulated FLstudy on the clinical interpretation of deletion. Number Variations (CNVs), and coding and non-coding Single Nucleotide Variants (SNVs) on ClinVar database. We show that federated models systematically outperform single-site models, and achieve similar or better performance than traditional centralised learning. In addition, we evidence that federated models exhibit more robustness than cen tralised models when an institution decides not to participate in the training. With our findings, we expect to incentive the adoption of FL for establishing secure multi-institutional collaborations in human variant interpretation.

KEYWORDS: Federated Learning, human genetic variants, privacy-preserving machine learning

2 Likes

Thanks a lot, @sivashchenko!

:sparkles: Thank you for a great discussion, everyone who showed up today! :sparkles:

Links and mentions in the Zoom chat:

Other things I remember we mentioned:

Please add more things in the comments if I forgot something! :blush:


Next time (2025-02-05T14:15:00Z), we’ll be discussing:
Albanese, F., Bloem, B. R., & Kalia, L. V. (2023) Addressing the “Replication Crisis” in the Field of Parkinson’s Disease https://doi.org/10.3233/JPD-239002

Hope to see you there!

3 Likes

Thanks for the warm welcome and moderation, Natalie! On the topic of registered reports, the professor whose name I couldn’t bring to mind is Chris Chambers who is currently working at Cardiff University. I misspoke in saying that his students’ work is all done as registered reports, at least I don’t see any RR recommendations for their work posted on the PCI Registered Reports site. His students’ projects are focused on the adoption and effects of RR’s, however. e.g. past doctoral student’s Dissertation, current student profile.

Simone asked how long the process of a RR takes. I looked into it, and here’s the gist from what I found:

On the PCI RR author guide they describe timelines for two kinds of review (section 2.20)

  • Standard review: 4-8 weeks before initial reviewer decision
  • Scheduled review: 1-6 weeks until initial reviewer decision. In this review style, the reviewers are selected based on a 1-page snapshot of the project proposal, and then the manuscript is submitted after reviewers have been selected. Chambers reports (a year ago) that in this review style the average time between manuscript submission and reviewer decision is 18 days.

As for whether it is a suitable style for PhD projects, Chambers’ recent talks report that PhD students make up the largest group of Registered Reports submitters (47%).

More resources for those interested from a talk given a year ago:

  • Explanation of how registered reports “2.0” (Scheduled Review) is faster (video) timestamp 35:55-39:07
  • Example timeline for Postdoc / PhD (video) timestamp 42:03-46:40

[Edit 3/30, I mistakenly described Aoife O’Mahony’s Thesis as a MSc Thesis, but it is in fact a PhD Dissertation]

2 Likes

It was a real pleasure taking part in the ReproducibiliTea journal club. Thank you, Natalie, for facilitating. Simone and I would like to keep club going and we’re wondering if others would like to join us at the same time (biweekly Wednesdays 15:15 CEST, starting April 16th)? We’d start with the paper “Reproducibility vs Replicability: A Brief History of a Confused Terminology” (Plesser, 2018). Please reach out if you are interested!

Also, since @alexbyrnes and others asked to get a summary of the 3/5 meeting where @rickcarlsson, Zia and I discussed “Open Code for Open Science?”(2014) by Steve Easterbrook, I’ve posted a summary of the paper and our discussion on my website here.

2 Likes

Awesome initiative, @Christian_N_Sodano! :partying_face: Very happy to hear that!

Will you also continue with the Open Science Coffee at the end of each session? :hot_beverage: If so, I look forward to seeing you there! :grinning_face:

And please let me or @Gavin know if you don’t already have access to everything you need (eg email addresses to participants, Zoom-account) to coordinate and host.

Just found an old paper by @Daniel-Mietchen which seems relevant to this thread: