Suggestions for a conda user preparing a package for PyPI

Hello,

My collaborators and I have been preparing code for submission to JOSS and I was very happy to discover this forum, since it looks like it is designed (in part) precisely to help with the packaging problems that we are confronting. We have been using conda for all of the development, and have only included pip in the conda environment.yml file to make the installation editable. None of us have used any other environment or package manager, and now that we are preparing to submit our work to PyPI (and possibly PyOpenSci, now that I’ve learned of their partnership :sunglasses:), we could use a bit of help with setting things up. The PyOpenSci Python Packaging Guide has been really helpful, but we still have a few questions that I’m hoping someone here can help us with.

  1. I assume that we will want to create separate environments for PyPI and conda-forge builds of the same repository? Does it make sense to use one of the more PyPI-oriented environment managers for the PyPI build, since pip installations are discouraged in conda?

  2. Our repo is written entirely in Python, but it depends on SciPy and NumPy. Does that dependency limit us to tools that can handle packages that are not pure Python?

  3. When building for both conda-forge and pip, it looks like we will need to use both environment.yml and pyproject.toml to manage our dependencies. Is there a way to reduce that redundancy, or at least minimize the potential for mismatches?

Thanks in advance!

Steve

1 Like

Hi Steve; I’m just a member of the community, but I’d be happy to help answer your questions.

As a preface, while I should be able to answer your questions for the general case, when asking these sorts of questions its a good idea to always link the repo of the project your asking about, so those responding can ensure their suggestions and recommendations are appropriate for the actual situation you’re asking about, and often give you other advice too or point out other tips.

While “environment files” like requirements.txt for pip and environment.yml for Conda are useful when sharing your scripts with colleagues, when deploying applications, or sometimes when installing development builds of packages, they aren’t what is used (or in the case of environment.yml, involved at all) when packaging your project on PyPI or Conda-Forge. The main purpose of those files is to construct/reconstuct, with varying degrees of precision, a specific environment with the specified versions of specific Python (or with environment.yml, also non-Python) packages installed. Generally speaking, the key goal here is reproducibility—the more precisely they reproduce the specific environment, the better they serve that purpose.

One or more requirements.txt and/or environment.yml files can sometimes be useful for contributors wanting to get a dev environment set up, for services like Binder, for people looking for a “prebuilt” environment, or when deploying as an application. However, those are separate and more specialized use cases from the PyPI or Conda-Forge packaging that it sounds like you’re asking about.

When you’re packaging your project for a package index, like PyPI or Conda-Forge, instead of starting with the specific target user environment, you want to look at things from the other direction and just specify your project’s dependencies, the packages that your project needs to run, and (optionally, but recommended) the versions of those dependencies it is requires. Here, the goal is compatibility—ensuring the dependency packages and their versions are only constrained as much as necessary to ensure your package will work, to minimize the chances of the dependencies of your package being declared to be incompatible with those of another. These are what is listed in project.dependencies in the pyproject.toml for most modern PyPI packages, and under the requirements → run key of the Conda-Forge feedstock recipe/meta.yaml. (And they do need to specified separately, for the reasons I’ll explain in answering question 3).

So, TL;DR: At least on your end when packaging your project for distribution on PyPI or Conda-Forge (or to be installed manually with pip install -e .), there aren’t actually going to be any “environments” (or environment files) involved here. “environments” are created on the target machine, by and under the control of the user or platform—not your project—and environment files are used to construct them, not specify the dependencies of a PyPI or Conda-Forge package.

A (Python) environment manager, like virtualenvwrapper, Tox/Nox, Conda, etc., or those build into “all-in-one” tools like Poetry, Hatch, PDM, etc. is a tool used on the user’s machine to create, modify and switch (Python) environments. While you might find one useful during development, and your users might install your package into an environment using one, it’s not something that really makes a difference when you’re building your package for distribution on PyPI or Conda-Forge.

Nowadays, for PyPI, your package is actually built in an isolated environment automatically created by the build frontend tool (e.g. the standard PyPA build, or an “all-in-one” frontend + backend like Hatch, Flit, PDM, Poetry, etc), and for Conda-Forge its built on the Conda-Forge CI servers. Neither any environment manager you may choose to use on your dev machine, nor any the end user may be using on their machine, has any role in this process.

I’m not totally sure how this relates to using an environment manager, or distributing your project for PyPI or Conda-Forge. But just to clarify, while yes you should generally avoid mixing pip and conda in the same environment if possible, pip install -e . is still the standard way to do an editable development install of your project in a Conda environment, at least for a pure Python project like yours. So long as you’ve either installed all its dependencies with Conda (with, e.g., conda install --only-deps your-project-name, or from a requirements.txt or environment.yml provided for this purpose), or only installed Python itself with Conda and installed your project itself and all its dependencies with pip, e.g.

conda create -n your-new-env python
conda activate your-new-env
pip install -e .

…then you shouldn’t have any problems.

Also, just to note, the former (minus the editable flag) is basically what Conda-Forge does to internally create your package (installing all the specified deps with Conda, pulling the package source from PyPI, running pip install . --no-deps to do the actual build and install of your project when creating your package on CI, and then packages up the resulting files). That’s why setting up your pyproject.toml and ensuring your package is buildable with the standard Python tools and installable with pip is basically necessary (though not entirely sufficient) to get your package on Conda-Forge, because Conda-Forge (through pip) reuses most of that, minus the runtime dependencies.

Nope. So long as your package itself is pure Python, with no C, Fortran, Cython, Pythran, or other compiled code, it makes no difference for you when packaging whether your dependencies are; you’ll still be able to build with pure-Python build tooling on the Python/PyPI side, and as noarch on the Conda-Forge side.

In particular, pure Python packages that depend on NumPy shouldn’t generally need to do anything special for dependencies beyond either any other pure Python package, just specify its dependencies as normal The resulting pure-Python wheel will work on any platform, arch and Python version. NumPy does have some recommendations for NumPy version support, but they are just best practice recommendations when declaring the versions you support. They also come with some useful links to additional docs on how SciPy handles things.

As you hopefully gleaned from my rather lengthy answer to question 1, you aren’t going to be working with environment.yml at all when specifying dependencies for either Conda-Forge or PyPI. For most modern PyPI packages (particularly pure Python ones like yours), you’ll be listing them under project.dependencies in the pyproject.toml, as you’re probably already familiar with. For Conda-Forge, they are listed under the requirements → run key of the recipe/meta.yaml file in the Conda-Forge feedstock, which is a repository that is automatically created for you by the Conda-Forge tooling when you submit your package for inclusion there (which itself can be mostly automated with tools like Greyskull, at least for pure Python packages like yours).

If you aren’t yet familiar with the details of Conda-Forge packaging, its a lot easier than it sounds and will make more sense once you do it, but the key takeaway is that your package configuration for Conda-Forge is managed independently of your source package/repository, in a seperate Conda-Forge managed repo. This means you don’t actually need to do anything in your source repo for it to work, and you can actually help package other people’s projects, and other people can help you package yours (just like with other package ecosystems, like for Linux distros, Homebrew/MacPorts, Choco, etc).

Those do have to be specified separately, because PyPI and Conda-Forge are two separate package indices with two separate namespaces, different build environments, different ways of doing things, and different scopes. Packages may have different names and possibly different dependencies on each (for various reasons), and Conda-Forge has a lot more than Python packages that your package might want to depend on, whereas for PyPI, you’d have to bundle (or “vendor”) those binary dependencies into your build wheel. It’s sort of like listing the social media handles of your maintainers on Twitter vs. Mastodon—sure, many of them might be the same, but you certainly can’t just assume that, and there’s likely to be differences for various reasons.

It does result in some duplication, but Conda-Forge has automated tooling to detect missing or unnecessary deps in new versions, and if you run the project’s test suite as part of the recipe, generally speaking any missing dep should trigger an error automatically. In practice, I find the extra work relatively minimal, and in fact a great opportunity to audit, discover and fix mistakes in the “upstream” PyPI package’s dependencies.

2 Likes

This is just what I needed, thanks! I see now that I misunderstood the role of requirements.txt and environment.yml. As for my repo, it’s not public yet but should be soon, so I’ll be sure to point to it next time if I have any other questions.

2 Likes

hey @jsdodge :wave: just wanted to welcome you here to pyOpenSci!

If anything here is still confusing about packaging please ask more questions and we can help clarify!

I am actually working on some content for the conda-forge / conda part of our docs. But generally your workflow will be

  1. publish to PyPI. To do this you do need a pyproject.toml file as this is where you store both your dependencies AND your project’s metadata which pypi will list for your project! After you do this, you can then submit your package to conda-forge using the greyskull package. . That package will create your conda recipe. If you publish on both pypi and conda-forge it will allow users that use conda environments to conda install your package to avoid that issue that can come up if you have to use pip to install thing into a conda environment.
  2. As C.A.M. mentioned above because your package is pure python and doesn’t wrap any other languages directly (but does rely on numpy / scipy which do have non python dependencies). you can use ANY TOOL that you wish!! yay! So pick the tool that you like the interface for best. There is no wrong answer here. :tada:
  3. I think C.A.M. covered this really well. But to summarize you declare your dependencies in the pyproject.toml. that is the file you need to create your package. that is the file that your build tool needs to declare dependencies in your package’s metadata.

for your own local development and instructions you provide for users who want to install your tool in a dev environment, feel free to use whatever environment tool that you prefer! i love conda myself and sometimes use the yml file to create dev environments. it’s a personal preference!

let us know if you have any other questions!! we are very lucky to have CAM in our community answering questions given his work in the core Python community among other places.

2 Likes

Thanks @lwasser, both for the welcome and for your additional help. You anticipated my questions about publishing to conda-forge and reinforced my understanding of @CAM-Gerlach’s post, which was already extraordinarily detailed and clear. I’ve clearly found the right place!

As a side note, I appreciate the attention that you have given to emphasizing the inclusive aspect of “Open” in the pyOpenSci community. It really makes a difference. :clap:

(And as I’m sure you know, it’s sorely needed elsewhere—see here, for example. :roll_eyes:)

2 Likes

hey there @jsdodge thank you for the kind words!! i know it can be really hard. i think there is a tension between volunteer maintainers having to deal with a LOT of users that can be demanding (understandably) but these maintainers are also volunteers with limited bandwidth. Also not all computer scientists and developers can be great at both development and customer service! it’s so so hard. but i hear you!! and we do want to be helpful and reduce that burden where we can and help users!!

Another note - in our peer review be sure to have a look at our submission template prior to submitting to ensure your package meets our initial checks.

we are super happy to support / answer questions here about packaging at any time! our review is just a bit different from JOSS in that we want to build community around packages that are maintained over time with a scientific focus and scope. Hopefully this helps!!

let’s continue to stay in touch here!! By the way - how did you hear about us? :slight_smile:

1 Like

Hi @lwasser, thanks for the pointer to the template. And I get that any particular project can’t be all things to all people—I really just meant to highlight the value of what you are doing here and that other communities (not least my own community of physicists) could do more of it.

I was aware of pyOpenSci from internet searches and the JOSS documentation, but I didn’t take the time to learn more about it until a member of my university’s library staff pointed me to your development guide. One reason that I didn’t look into pyOpenSci earlier is that the JOSS documentation made it sound like submitting work to JOSS or to pyOpenSci was an either/or proposition, which I now see was mistaken.

1 Like

Hey @jsdodge thank you for pointing out the JOSS documentation issue. i had noticed that and wondered how people read that statement. it’s definitely not either / or for pyOpenSci or rOpenSci.
i’ll mention this to the JOSS folks just so they are aware!! Many thanks for bringing this to our attention!!