In this issue on GitHub the topic of multiple guidebooks came up via Pey Lian.
Numerous guides were identified including
It would be nice to inventory these guides and identify where there are overlapping elements that we could merge into something that is more ecosystem friendly. We could even try to share text across guides and in the pyOpenSci space - ensure that any text we use frm other guides is well branded to identify who originally developed it.
Thank you for starting this discussion! While working on PlasmaPy’s contributor guide, I kept thinking about how so much of the content overlaps with different projects…while there are also many aspects that are unique to each project.
A point raised in this post is that we should distinguish between contributor/development documentation and packaging documentation. Perhaps we should focus on the packaging aspect first. (I was also thinking that we could create a guide on best practices for scientific Python, which could perhaps incorporate content from the Writing Clean Scientific Software slide deck…though we’d probably want to keep that distinct and discuss it in a separate topic.)
The PyPA packaging guide and Astropy dev guide have come up in these conversations too.
I’m wondering if we should start a page of Community Resources on Packaging and Distribution (similar to an Awesome repo) as a collecting point for all of these guides, presentations, etc.
One of the places we can start is by diving into the challenges for creating shared community packaging resources. These challenges probably correspond to the predominant Python packaging pain points (which maybe we should give the acronym pPppp since it’s going to come up a lot). I think that laying out the pPppp will probably help guide us in creating the guide.
Here are the pPppp that I can think of, primarily based on my own experience and what I’ve heard from others (including the pyOpenSci Slack).
- The Python packaging landscape itself is fragmented, and different projects use different tools.
- The Python packaging landscape is changing relatively rapidly.
- Different packages have different needs. Some are pure Python. Others use Cython, Fortran, and/or C/C++. Some packages require a large amount of data (e.g., atomic databases) and may have to download it somewhere from the World Wide Web™. It may be the case that the best packaging framework will not be the same for different projects.
- Established packages have often rolled their own solutions. For example, we have some custom GitHub Actions in PlasmaPy that could have been filled by some of the tools now available in the Python packaging landscape. Often these solutions may be coupled to each other.
- Package maintainers are often stretched too thin or are too burnt out to put much time into redoing packaging to fit a community standard, even though it would save time & effort in the long run. (I’ve often wondered if it would be possible to have dedicated software engineers who would specialize in Python packaging, and could do this for a bunch of different projects.)
Again, thank you for starting this discussion. This could be really helpful to our community!
This is a great discussion.
- @willingc i love the idea of collecting a list of existing packages. i’ll try to create a document where we can start to collect these. The one challenge with these lists sometimes (similar to affiliated package lists!) how do you know when a resource becomes dated, unmaintained etc? at the same time knowing what’s out there is incredibly useful and we do want to link to other maintained resources!
- that list of challenges, @namurphy is a real one! the package guide was inspired by 1, 2 in the list. This actually happened because (this related to your #5) i was rebuilding a maintainer team on stravalib. and it was SO SO HARD to figure out how to use setuptools_scm / to figure out what build tools i should be using at all. most resources were dated as well so i ended up piecing together info from several folks in our slack combined with some partially dated blog posts!! yikes!
Related to your #5, in rebuilding stravalib’s maintainer team, i became the dedicated infrastructure person. so i redid CI, the build system, and the release / versioning system + contributing dev docs. it’s actually nice to have a maintainer team where we have a few devs and than a dedicated infrastructure person. i know many maintainer teams are not big enough for that but it has worked well for us. and regardless i hear you. i spent a lot of time on infrastructure.
i will try to pull together some initial documents for us to collect package resources and such soon. i’m also really excited about your dev slides and think that would be a wonderful set of online resources. we are publishing things on datascienceskills.org when we start doing that and plan to allow for branding if the resources are written and contributed by another organization / group etc. i have some work to do to overhaul it (it’s in quarto). but it will get done!!