Finding a specification for repository metadata

In a recent meeting, we discussed what kind of pattern we’d like authors to follow to add metadata to their repositories. My inclination is to try and follow what others are already doing. the rOpenSci community uses a codemeta.json pattern, and also provides some tooling to manipulate and create this file.

It looks like there are already some attempts to do this in Python. So perhaps we should look into this as a pattern to suggest for our projects.

I kind of like the approach that the codemetar project uses - makes it easy to create a codemeta file using pre-existing patterns in the R community (like a DESCRIPTION file). In Python we have the challenge that there are lots of ways for repository metadata from a packaging standpoint, so if we had a tool that’s used to generate a codemeta file, it could be an abstraction that lives on top of all of these different packaging approaches.

1 Like

i’m all for a json based metadata standard for packages!! it’s language agnostic :slight_smile: thanks for posting this.

I would consider the existing places that project-level metadata is stored, such as pyproject.toml and setup.cfg. There is a discussion of the pros and cons of TOML-vs-JSON-vs-configparser in PEP 518, which might be worth considering, if JSON isn’t set in stone.

If JSON is already established for codemeta, that’s definitely a point in its favor.

Python packaging practices seem to go through phases of proliferation and consolidation of config files, and my impression is that a lot of tools are moving toward pyproject.toml.

1 Like

Agreed @choldgraf - CodeMeta is a nice standard to follow.

The codemetapy package looks promising. It would be great to test whether it works for multiple use cases (i.e., packages with metadata in different places).

@effigies WELCOME To our forum!! this is good to know. we should definitely consider pyproject.toml files then if things are moving that way!

1 Like

I’m the author of codemetapy and would be open to extending codemetapy with pyproject.toml support if there is interest. The architecture behind codemetapy explicitly allows multiple input types, even chaining them so you can incrementally build up your metadata, and convert them to codemeta.

1 Like

Ah, I see, codemeta.json is the full JSON-LD descriptor, not the input to some system like codemetapy. (Or, it looks like it can be an input, but it’s always the output.) I had been imagining it as a cmdclass hook to setuptools.setup that would build some RDF representation from the JSON (non-LD) inside the package.

Looking at your setup.py/MANIFEST.in, it’s not getting packaged, so I guess the expected place to look this up would be in the repository source, and not the wheel/sdist on PyPI? If so, then maybe adding information to pyproject.toml instead of directly to codemeta.json would just be duplicative. Or perhaps it could be a more user-friendly source of metadata that isn’t available to setuptools.setup.

1 Like

I had a look at codemeta and codemetapy a bit more, and they do look like promising standards to follow / tools to use. A few thoughts:

  • codemetapy doesn’t support some of the python packaging formats. In the long run it probably should, but I don’t think that we need to worry about this right now since we could just tell authors what key/value pairs to put into a setup.py file, and not currently recommend pyproject.toml.
  • it’s unclear to me if codemetapy will work nicely on non-*nix systems…the examples all use various forms of pipes etc, which makes me worry it isn’t built with windows in mind.
  • I think the main question is what “key:value” pairs should we require? I think that’s separate from the question of what tools we use. We probably want some minimal subset of the codemeta spec
1 Like

hi there colleagues!! @choldgraf it looks like @proycon (Welcome!!! Maarten to our forum!!) could be willing to support that pyproject.toml if it were what we wanted to use.

@proycon does your package work on windows ?? this is all awesome conversation.

Ah, I see, codemeta.json is the full JSON-LD descriptor, not the input to some system like
codemetapy. (Or, it looks like it can be an input, but it’s always the output.) I had been imagining it as
a cmdclass hook to setuptools.setup that would build some RDF representation from the JSON
(non-LD) inside the package.

Indeed, it outputs the full JSON-LD descriptor and can do so from multiple sources, the main one for python packages is currently the output of pip -v. A post-install hook to build the metadata would be a good idea too, and as I said I’m indeed open to supported pyproject.toml as an input source.

I’m currently indeed not packaging the codemeta.json for wheel/sdist on PyPi, though that’s not really deliberate. It’s more up to what conventions one wants to establish (and currently there aren’t many such conventions formulated yet afaik), and something can indeed be said for packaging the codemeta.json indeed.

It’s unclear to me if codemetapy will work nicely on non-*nix systems…the examples all use various
forms of pipes etc, which makes me worry it isn’t built with windows in mind.

Good question, I haven’t really tried as I’m not a Windows user. In principle though, I don’t think there’s anything in the codebase that prevents it from running on windows as it’s all pure Python (the only thing I can think of is pyyaml binding being a possible complication but I assume there’s a Windows wheel for that too). The tool by default does use standard input and standard output heavily (hence the pipes), doesn’t Windows have something similar in its powershell? We can probably devise workarounds if this is an issue.

1 Like

I just released a new version of codemetapy (v0.3.0: https://github.com/proycon/codemetapy/releases/tag/v0.3.0). Instead of parsing pip output, it now loads python metadata using Python 3.8’s new importlib.metadata (or its backported variant for older Python version). As it queries the metadata after installation, it should work regardless of how the metadata was specified (setup.py, setup.cfg or pyproject.toml) and without needing to be able to parse any of those directly.

I also made some extra changes to reduce the need for shell operations such as redirecting and piping.

3 Likes

@proycon if we have a meeting more focused on metadata would you be available to drop in to chat with us ? i suspect that you’d have a tremendous amount of knowledge to add to our discussions. @choldgraf what do you think about that? i’m thinking potentially after AGU or in january depending upon when our next meeting falls. id really like to ensure people could attend who might have an interest in this topic but it would be good to discuss some options that we could try to implement!

1 Like

Wanted to chime in- I’ve been working with a group on getting software metadata from existing disciplinary research software repositories (https://asclnet.github.io/SWRegistryWorkshop/), and would would love to get involved in any discussions. I can also send out notice to the group to see if others would be interested in joining.

1 Like

hey @tmorrell Welcome to pyopensci!!

this is absolutely awesome to hear. we are very happy to support any standards that the community decides are the way to go. it would be great to have a larger discussion to better understand what you guys are working on and how we can collaborate and implement outcomes, etc etc! i’ve been thinking that we might want to have a specific meeting for this very topic. because many of the group aren’t able to attend our usual thursday time we could find a different time potentially too that would accommodate your colleagues as well! just a thought.

@lwasser Yeah, I’d be glad to join your discussions! (Do note that I’m not involved in the actual codemeta project but just a user and author of codemetapy, you might perhaps want to get an actual codemeta representative too)

1 Like