Preprint servers face closure because of money troubles

It seems to me you could build a torrent application specifically for this.

First, start with a global area network. Any device loading the program will automatically connect to this GAN, this way you can avoid the need for centralized discovery servers. You may need to add in TURN/STUN code to do NAT traversal etc.

Build a new client and then compile to WebAssembly. For file storage, IndexedDb is supported by all major browsers and provides a sufficiently large capacity. At this point youā€™re left with a web page (running a compiled program) to search for, display, and download papers. This single file, index.html, can then be statically hosted on any CDN, for example CloudFront, which will serve up to 2 million requests for free.

Maybe the OSF has considered approaches like this? Perhaps it is infeasible in one way or another, but just thought I would throw the idea out there. Iā€™ve pitched a similar scheme to my lab in order to avoid the computational costs of scientific calculations (do them on the client in compiled WebAssembly vs. a centralized server infrastructure), however, unfortunate considerations have stood in the way of implementing this approach.

2 Likes

Like @Enrico.Fucci, I also found Brianā€™s bigger picture article about pre-prints to be very informative about their potential to reform the academic publishing system by separating publishing from evaluation.

The following article may also be of interest, as it suggests that just taking pre-prints directly to twitter has already started to provide a good alternate form of evaluation to peer review.

2 Likes

@grant I have been thinking about this and I believe it is a very good idea! Taking away the load on servers and share preprints through P2P seems to be a very valuable option! What could be the arguments against it?

  • The document could get lost at some point (?). In my opinion, that would not happen because the original author will always have a copy of the preprint, or somebody else in the network. Even if a indexing platform is shut down, the hash address of the torrent is always available. Finally, if really something is lost, it means that it was not of much interest. I imagine that interesting works would get a lot of shares and therefore would not get lost. This could actually be a good system to filter publications.

  • The original file can be modfied (?). I am not sure about this, but could some form of blockchain cryptation be implemented?

  • Universities do not allow P2P in their networks. This is a fact in many institutions. But rules can be changed

I am eager to discuss more!

@brucecaron, hereā€™s one!

Cool. Thanks!

@rebecca, @Enrico.Fucci, @brucecaron Project Ajur run by Iris.ai might also be of interest. To be honest, I donā€™t know if itā€™s going anywhere as they donā€™t seem to have updated this in a while (AI assisted article discovery is their main focus), but it might have some interesting information.

1 Like

@Enrico.Fucci Cool! This is an exercise in thinking out loud :slight_smile:

Ideally, you would construct this as a sharded, distributed, replicated, database. The downside?..very difficult type of code to write correctly. This means every connection would replicate a certain fraction of the database, and you would need a certain number of recurrent connections to do this in a statistically correct/reliable way. You would also need to (I think) stand this up with dedicated seed servers, until the traffic and number of connections was stable enough to ensure correctness when relying solely on distributed data sharing, then you could pull down the seed servers.

Checksums would take care of this, without needing to invoke the heavy machinery of the blockchain. You could certainly (and probably should) encrypt all of this, and that would make deep packet inspection very difficult, making the application hard to block as it would operate over the standard http port 80 (see below).

This is the neat part about doing it via a web page, all the traffic goes over port 80, whereas most firewalls would block traffic on all other ports (these are what a torrent application would typically use). The disadvantage would be the application would only be running when a user has loaded the webpage, giving less time to act as a seed (a normal torrent application can run as a background process on a userā€™s computer). With this in mind, it would probably be necessary to find a way to run this as a background process on some subset of users hardware that did not have any port restrictions, not sure though.

I guess one solution would be that you can access the files via web when connecting from an institution, but you can seed files when using your laptop at home or elsewhere. Another solution is that institutions allow P2P connections, which would not be too bad.

I am curious to know the opinion of @briannosek and others on this. It might be a very valuable way to host preprints for those archives that do not manage to pay a fee for a cloud server.It is also something that does not seem too complicated to put in place if there is already a credible institution supporting it (e.g. OSF?)

P2P solutions have been effective for a variety of sharing solutions. I donā€™t see why they wouldnā€™t also be effective for sharing preprints/papers. So, it is certainly a reasonable thing for someone to explore.

Also, in the thread there were questions about OSF self-hosting. Los Alamos National Labs in the U.S. installed their own OSF for security reasons ā€“ they use it internally, not public facing. And, NII in Japan is hosting their own for Japanese research. Because of govā€™t restrictions, they had to install and maintain their own.

1 Like

@briannosek: that is great news. i have issues with indonesiaā€™s national repository, especially its terms and conditions. but postponing that to respond to the pandemic. :slight_smile:

I think that eLifeā€™s current move towards solely reviewing pre-prints has a lot of synergy to this post that Brian linked to.

1 Like