I’m part of a little employee-owned data & software engineering org called Catalyst Cooperative. We mostly work with US energy system data and support researchers and policy NGOs trying to accelerate the transition away from fossil fuels.
We’re planning to do a bunch of community outreach in 2024 (hopefully with support from the NSF POSE grant program) to get more people familiar with the open data that we publish, and to help the community get more familiar with best practices in reproducible data processing, and the kinds of software tooling that makes it easy to work with tables that contain millions to billions of rows.
All of our own tooling is written in Python, but we’re moving to a model of just distributing tabular data as SQLite databases or Parquet files so folks can use R, or DuckDB or whatever other tooling their most familiar with, and not have to worry about installing or running the huge pile of dependencies we need to produce the data.
We’re interested in developing a series of example notebooks or other tutorial materials that can help students (really anyone from undergrads to post-docs) working in energy systems get up to speed with doing data analysis in Python, while working with relevant data that they’ll hopefully find interesting and useful in their research.
This feels like it might be adjacent to the work that PyOpenSci is doing, and we’re wondering how we might learn from / participate in the community development work you’re doing, to get the open energy modeling & data community better organized and more familiar with reproducible open science standards and working as a more coherent open source ecosystem.
Would this make sense as a potential organizational collaboration? We’re also reaching out to The Carpentries and were thinking about talking to the US RSE as well.