Real-world repo packs -- looking for examples

Hey pyOpenSci,

I’m looking for real-world examples of repro-packs or research compendiums: bundles of code, figure plotting scripts, data, and whatever else is needed to reproduce the results in a computational paper.

Reproducibility packages AKA repro-packs is the term Lorena Barba’s group uses (http://blogs.nature.com/naturejobs/2017/04/17/techblog-my-digital-toolbox-lorena-barba/).
As an example, here’s a link to a GitHub repository that houses code for one of their papers (https://github.com/barbagroup/pygbe_lspr_paper) – there’s a link in the README poining to the final repro-pack on Figshare.

Research compendium is another name for this practice that I know from hearing Karthik Ram talk about it at. Slides from a talk he gave here https://github.com/karthik/rstudio2019 with links to several real-world examples.

I’m specifically looking for examples that use Python (obvs), and from as many different research areas as possible.

Mainly what I’d like to see is how people structure code across research domains.
Recently I led a session on “Python 102” about the basics of organizing code (https://python-102.readthedocs.io/en/latest/packaging.html#), and I couldn’t find a lot of different examples, but maybe that’s because I don’t know where to look.

So I’m hoping to crowdsource those examples.

My idea is to maybe make them part of a blog post for pyOpenSci, not the end-all be-all on the subject but maybe like a mini-review that a student could at least look at to get ideas on how to package their code and share their results in a reproducible way.

What I am not looking for is source code for libraries, tools, etc., even though that is of course the scope of PyOpenSci. Fairly simple examples without a ton of development-related cruft (docker/travis/environment.yml etc) in the root of the repository would be great.

Of course I love to see a beautifully organized codebase for libraries but the idea here is to show someone a real-world example from their field what simulation / analysis / figure scripts etc for their papers might look like. Like a bunch of concrete examples of the structure suggested in “Good Enough Practices for Scientific Computing” (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510). Plus some inspiration for the whole repro-pack itself (figures are def field specific too!)

Thanks, looking forward to seeing what people share and hearing any comments
–David

I’m not sure how reproducible since I don’t have an environment.yml but https://bitbucket.org/story645/libltf has all the scripts for making figures for a paper on ensemble forecast evaluation (climate) w/ a link to the paper (which is in a different private repo that I have to fight w/ bitbucket about making public.)

1 Like

Nice! Sounds perfect – will give this a look
Thank you @story645

Getting back to this now @story645

Probabilistic forecasting sounds cool and it’s impressive to do that with just pure numpy and scipy.

I’m looking at the tools you used to build it too.
looks like Distribute and Buildout provide some of the same functionality as, say, Flit or Poetry?
And modern-package-template is sort of like cookiecutter?

Thank you again for sharing this

1 Like

(slowly) posting some other links people replied with on Twitter too.

Naty Clementi from Lorena Barba’s group gave an example of one of their recent papers, documenting an extension of their PyGBe library that enables application of computational nanoplasmonics to biomolecule detection


link to the pre-print on arxiv: https://arxiv.org/pdf/1812.10722.pdf

and the link to the repo


that itself includes a link to a repro-pack with figures on FigShare

(same one I posted above … but now just noticing the brave blow-by-blow of the submission process in the README, including drafting responses with GitHub Issues).

Thanks, even though it’s mostly cause the project predates xarray & pymc3. :wink:
And yeah, modern-package-template is proto cookiecutter And distribute&buildout are part of the templating so I dunno but yes they have to do w/ packaging.

1 Like

Good to know – if you were starting now, you’d use xarray and pymc3? Are there probabilistic forecasting libraries that are wrappers around those? Just trying to get a feel for who uses what “core” libraries

xarray is for nd labeled data, developed explicitly for climate data, so it would have probably made some of the line up code simpler. Pymc3 is for probabilistic programming.

1 Like

still slowly posting links from that Twitter thread (don’t mean to post over you @story645 – thank you, helpful to know. I need to take xarray for a test drive at some point)

@khinsen shared some examples of active papers on Zenodo:
https://zenodo.org/search?page=1&size=20&q=ActivePapers

but I think got frustrated because of the default limits on the number of links new members can post
(thanks to the googling skills of @chendaniely and admin powers of @lwasser we got this raised)

I wasn’t actually familiar with these, but am now.
Examples from Konrad’s own work:

1 Like

And last but not least Olivia Guest brought up ReScience C:

which can be found here: (saving you some cutting and pasting)
http://rescience.github.io/

if you haven’t seen ReScience C before, they describe themselves as " an open-access peer-reviewed journal that targets computational research and encourages the explicit replication of already published research, promoting new and open-source implementations in order to ensure that the original research is reproducible."

Importantly all the articles have a GitHub repo associated with them, cf. this paper which I actually have spent a bunch of time looking at during my post-doc, and was surprised to find there
http://rescience.github.io/bibliography/lemasson_2016.html

nb: they’ll soon be presenting results from their 10-year reproducibility challenge
http://rescience.github.io/ten-years/