I came across this handy tool for digitizing plots recently and found it very useful. It has enough automation to save a lot of time, but enough interactivity to correct mistakes easily.
It’s not really a replicability focused tool, but it’s certainly useful for digitizing values from plots in papers that don’t provide raw data-
I think sharing transcribed data should be a bigger part of open science. Transcribing the data takes a lot of time so it would be best to minimize duplicate work if possible. I also note in README files if I noticed any issues with the data, though this could also be done on post-publication peer review websites like PubPeer. Putting the notice with the data is more convenient, I think, however. And if the problem I noticed prompted me to modify the data, that definitely is worth noting with the data. (I imagine that some people have a different philosophy where the data as published itself is an artifact that should not be changed, but I don’t share that philosophy.)
Also, if you want extra precision or find your hand shaking too much, look for a setting or program to do “mouse emulation”. This’ll allow you to move your cursor one pixel at a time with the keyboard.
Wow, that’s a lot of papers you’ve transcribed data from!
This is a good point, and I don’t think I’ve ever heard anybody mention it before. I had planned to include the transcribed data in a supplement for what I publish, but this makes me think it would be better to put it in a separate place/repository and link to that from my paper. It seems like it would also be useful to put a link to the data in a comment PubPeer for the original publication, to make it more visible. Do you do that for the datasets you have on Github?
I haven’t put any comments on PubPeer for the data I’ve transcribed, but that’s a good idea. I’ve added this to my PubPeer to-do list, which I keep in my reference manager.