I came across this handy tool for digitizing plots recently and found it very useful. It has enough automation to save a lot of time, but enough interactivity to correct mistakes easily.

It’s not really a replicability focused tool, but it’s certainly useful for digitizing values from plots in papers that don’t provide raw data-


These types of programs are great. I’m partial to this one: https://github.com/pn2200/g3data

I used g3data extensively during my PhD and have continued using it afterwards. You can see a significant fraction of the data I’ve transcribed here: https://github.com/btrettel/pipe-jet-breakup-data/tree/master/data

I think sharing transcribed data should be a bigger part of open science. Transcribing the data takes a lot of time so it would be best to minimize duplicate work if possible. I also note in README files if I noticed any issues with the data, though this could also be done on post-publication peer review websites like PubPeer. Putting the notice with the data is more convenient, I think, however. And if the problem I noticed prompted me to modify the data, that definitely is worth noting with the data. (I imagine that some people have a different philosophy where the data as published itself is an artifact that should not be changed, but I don’t share that philosophy.)

Also, if you want extra precision or find your hand shaking too much, look for a setting or program to do “mouse emulation”. This’ll allow you to move your cursor one pixel at a time with the keyboard.


Yay! Brings back memories… like 2014 :relaxed: Plotly Blog - Automatically Grab Data From an Image with...

1 Like

Wow, that’s a lot of papers you’ve transcribed data from!

This is a good point, and I don’t think I’ve ever heard anybody mention it before. I had planned to include the transcribed data in a supplement for what I publish, but this makes me think it would be better to put it in a separate place/repository and link to that from my paper. It seems like it would also be useful to put a link to the data in a comment PubPeer for the original publication, to make it more visible. Do you do that for the datasets you have on Github?

1 Like