Python packaging guide: tests and data for your Python package

We’ve updated the testing section of our Python packaging guide, and will be using this space to talk through each of the four main sections this week:

:thinking: Why write tests?

:sparkles: Types of tests

:running_man: Run tests locally

:computer: Run tests online

Tests are an important part of your Python package because they provide a set of checks that ensure that your package is functioning how you expect it to. By the end of this section of our Python packaging guide, you’ll understand the importance of writing tests for your Python package and know how to get started with setting up infrastructure to run your tests both locally and on GitHub.


originally published as part of pyOpenSci’s Python Packaging Guide

Write tests for your Python package

Writing code that tests your package code, also known as test suites, is important for you as a maintainer, your users, and package contributors. Test suites consist of sets of functions, methods, and classes that are written with the intention of making sure a specific part of your code works as you expected it to.

Why write tests for your package?

Tests act as a safety net for code changes. They help you spot and rectify bugs before they affect users. Tests also instill confidence that code alterations from contributors won’t breaking existing functionality.

Writing tests for your Python package is important because:

  • Catch Mistakes: Tests are a safety net. When you make changes or add new features to your package, tests can quickly tell you if you accidentally broke something that was working fine before.
  • Save Time: Imagine you have a magic button that can automatically check if your package is still working properly. Tests are like that magic button! They can run all those checks for you saving you time.
  • Easier Collaboration: If you’re working with others, or have outside contributors, tests help everyone stay on the same page. Your tests explain how your package is supposed to work, making it easier for others to understand and contribute to your project.
  • Fearless Refactoring: Refactoring means making improvements to your code structure without changing its behavior. Tests empower you to make these changes as if you break something, test failures will let you know.
  • Documentation: Tests serve as technical examples of how to use your package. This can be helpful for a new technical contributor that wants to contribute code to your package. They can look at your tests to understand how parts of your code functionality fits together.
  • Long-Term ease of maintenance: As your package evolves, tests ensure that your code continues to behave as expected, even as you make changes over time. Thus you are helping your future self when writing tests.
  • Easier pull request reviews: By running your tests in a CI framework such as GitHub actions, each time you or a contributor makes a change to your code-base, you can catch issues and things that may have changed in your code base. This ensures that your software behaves the way you expect it to.

Tests for user edge cases

Edge cases refer to unexpected or “outlier” ways that some users may use your package. Tests enable you to address various edge cases that could impair your package’s functionality. For example, what occurs if a function expects a pandas dataframe but a user supplies a numpy array? Does your code gracefully handle this situation, providing clear feedback, or does it leave users frustrated by an unexplained failure?

For a good introduction to testing, see this Software Carpentry lesson

Test examples

Let’s say you have a Python function that adds two numbers a and b together.

def add_numbers(a, b):
    return a + b

A test to ensure that function runs as you might expect when provided with different numbers might look like this:

def test_add_numbers():
    result = add_numbers(2, 3)
    assert result == 5, f"Expected 5, but got {result}"

    result2 = add_numbers(-1, 4)
    assert result2 == 3, f"Expected 3, but got {result2}"

    result3 = add_numbers(0, 0)
    assert result3 == 0, f"Expected 0, but got {result3}"


How do I know what type of tests to write?

This section has been adapted from a presentation by Nick Murphy.

At this point, you may be wondering - what should you be testing in your package? Below are a few examples:

  • Test some typical cases: Test that the package functions as you expect it to when users use it. For instance, if your package is supposed to add two numbers, test that the outcome value of adding those two numbers is correct.

  • Test special cases: Sometimes there are special or outlier cases. For instance, if a function performs a specific calculation that may become problematic closer to the value = 0, test it with the input of both 0 and

  • Test at and near the expected boundaries: If a function requires a value that is greater than or equal to 1, make sure that the function still works with both the values 1 and less than one and 1.001 as well (something close to the constraint value)…

  • Test that code fails correctly: If a function requires a value greater than or equal to 1, then test at 0.999. Make sure that the function fails gracefully when given unexpected values and help and that the user can easily understand why if failed (provides a useful error message).


originally published as part of pyOpenSci’s Python Packaging Guide

Test Types for Python packages

Three types of tests: Unit, Integration & Functional Tests

There are different types of tests that you want to consider when creating your test suite:

  1. Unit tests
  2. Integration
  3. End-to-end (also known as Functional) tests

Each type of test has a different purpose. Here, you will learn about all three types of tests.

Unit Tests

A unit test involves testing individual components or units of code in isolation to ensure that they work correctly. The goal of unit testing is to verify that each part of the software, typically at the function or method level, performs its intended task correctly.

Unit tests can be compared to examining each piece of your puzzle to ensure parts of it are not broken. If all of the pieces of your puzzle don’t fit together, you will never complete it. Similarly, when working with code, tests ensure that each function, attribute, class, method works properly when isolated.

Unit test example: Pretend that you have a function that converts a temperature value from Celsius to Fahrenheit. A test for that function might ensure that when provided with a value in Celsius, the function returns the correct value in degrees Fahrenheit. That function is a unit test. It checks a single unit (function) in your code.

# Example package function
def celsius_to_fahrenheit(celsius):
    Convert temperature from Celsius to Fahrenheit.

        celsius (float): Temperature in Celsius.

        float: Temperature in Fahrenheit.
    fahrenheit = (celsius * 9/5) + 32
    return fahrenheit

Example unit test for the above function. You’d run this test using the pytest command in your tests/ directory.

import pytest
from temperature_converter import celsius_to_fahrenheit

def test_celsius_to_fahrenheit():
    Test the celsius_to_fahrenheit function.
    # Test with freezing point of water
    assert pytest.approx(celsius_to_fahrenheit(0), abs=0.01) == 32.0

    # Test with boiling point of water
    assert pytest.approx(celsius_to_fahrenheit(100), abs=0.01) == 212.0

    # Test with a negative temperature
    assert pytest.approx(celsius_to_fahrenheit(-40), abs=0.01) == -40.0

Your unit tests should ensure each part of your code works as expected on its own.

Integration tests

Integration tests involve testing how parts of your package work together or integrate. Integration tests can be compared to connecting a bunch of puzzle pieces together to form a whole picture. Integration tests focus on how different pieces of your code fit and work together.

For example, if you had a series of steps that collected temperature data in a spreadsheet, converted it from degrees celsius to Fahrenheit and then provided an average temperature for a particular time period. An integration test would ensure that all parts of that workflow behaved as expected.

def fahr_to_celsius(fahrenheit):
    Convert temperature from Fahrenheit to Celsius.

        fahrenheit (float): Temperature in Fahrenheit.

        float: Temperature in Celsius.
    celsius = (fahrenheit - 32) * 5/9
    return celsius

# Function to calculate the mean temperature for each year and the final mean
def calc_annual_mean(df):
    # TODO: make this a bit more robust so we can write integration test examples??
    # Calculate the mean temperature for each year
    yearly_means = df.groupby('Year').mean()

    # Calculate the final mean temperature across all years
    final_mean = yearly_means.mean()

    # Return a converted value
    return fahr_to_celsius(yearly_means), fahr_to_celsius(final_mean)

End-to-end (functional) tests

End-to-end tests (also referred to as functional tests) in Python are like comprehensive checklists for your software. They simulate real user end-to-end workflows to make sure the code base supports real life applications and use-cases from start to finish. These tests help catch issues that might not show up in smaller tests and ensure your entire application or program behaves correctly. Think of them as a way to give your software a final check before it’s put into action, making sure it’s ready to deliver a smooth experience to its users.

End-to-end or functional tests represent an entire workflow that you expect your package to support.

End-to-end test also test how a program runs from start to finish. A tutorial that you add to your documentation that runs in CI in an isolated environment is another example of an end-to-end test.

Note: For scientific packages, creating short tutorials that highlight core workflows that your package supports, that are run when your documentation is built could also serve as end-to-end tests.

Comparing unit, integration and end-to-end tests

Unit tests, integration tests, and end-to-end tests have complementary advantages and disadvantages. The fine-grained nature of unit tests make them well-suited for isolating where errors are occurring. However, unit tests are not useful for verifying that different sections of code work together.

Integration and end-to-end tests verify that the different portions of the program work together, but are less well-suited for isolating where errors are occurring. For example, when you refactor your code, it is possible that that your end-to-end tests will break. But if the refactor didn’t introduce new behavior to your existing code, then you can rely on your unit tests to continue to pass, testing the original functionality of your code.

It is important to note that you don’t need to spend energy worrying about the specifics surrounding the different types of tests. When you begin to work on your test suite, consider what your package does and how you may need to test parts of your package. Bring familiar with the different types of tests can provides a framework to help you think about writing tests and how different types of tests can complement each other.

originally published as part of pyOpenSci’s Python Packaging Guide

Run Python package tests

Running your tests is important to ensure that your package is working as expected. It’s good practice to consider that tests will run on your computer and your users’ computers that may be running a different Python version and operating systems. Think about the following when running your tests:

  1. Run your test suite in a matrix of environments that represent the Python versions and operating systems your users are likely to have.
  2. Running your tests in an isolated environment provides confidence in the tests and their reproducibility. This ensures that tests do not pass randomly due to your computer’s specific setup. For instance, you might have unexpectedly installed dependencies on your local system that are not declared in your package’s dependency list. This oversight could lead to issues when others try to install or run your package on their computers.

On this page, you will learn about the tools that you can use to both run tests in isolated environments and across Python versions.

Tools to run your tests

There are three categories of tools that will make is easier to setup and run your tests in various environments:

  1. A test framework, is a package that provides a particular syntax and set of tools for both writing and running your tests. Some test frameworks also have plugins that add additional features such as evaluating how much of your code the tests cover. Below you will learn about the pytest framework which is one of the most commonly used Python testing frameworks in the scientific ecosystem. Testing frameworks are essential but they only serve to run your tests. These frameworks don’t provide a way to easily run tests across Python versions without the aid of additional automation tools.
  2. Automation tools allow you to automate running workflows such as tests in specific ways using user-defined commands. For instance it’s useful to be able to run tests across different Python versions with a single command. Tools such as nox and tox also allow you to run tests across Python versions. However, it will be difficult to test your build on different operating systems using only nox and tox - this is where continuous integration (CI) comes into play.
  3. Continuous Integration (CI): is the last tool that you’ll need to run your tests. CI will not only allow you to replicate any automated builds you create using nox or tox to run your package in different Python environments. It will also allow you to run your tests on different operating systems (Windows, Mac and Linux). We discuss using CI to run tests here.

What testing framework / package should I use to run tests?

We recommend using Pytest to build and run your package tests. Pytest is the most common testing tool used in the Python ecosystem.

The Pytest package also has a number of extensions that can be used to add functionality such as:

  • pytest-cov allows you to analyze the code coverage of your package during your tests, and generates a report that you can upload to codecov.

Note: Your editor or IDE may add additional convenience for running tests, setting breakpoints, and toggling the –no-cov flag. Check your editor’s documentation for more information.

Run tests using pytest

If you are using pytest, you can run your tests locally by calling:


Or if you want to run a specific test file - let’s call this file “” - you can run:


Learn more from the get started docs.

Running pytest on your computer is going to run your tests in whatever Python environment you currently have activated. This means that tests will be run on a single version of Python and only on the operating system that you are running locally.

An automation tool can simplify the process of running tests in various Python environments.

Tests across operating systems: If you want to run your tests across different operating systems you can continuous integration. Learn more here.

Tools to automate running your tests

To run tests on various Python versions or in various specific environments with a single command, you can use an automation tool such as nox or tox. Both nox and tox can create an isolated virtual environments. This allows you to easily run your tests in multiple environments and across Python versions.

We will focus on Nox in this guide. nox is a Python-based automation tool that builds upon the features of both make and tox. nox is designed to simplify and streamline testing and development workflows. Everything that you do with nox can be implemented using a Python-based interface.

Other automation tools you’ll see in the wild:

  • Hatch is a modern end-to-end packaging tool that works with the popular build backend called hatchling. hatch offers a tox-like setup where you can run tests locally using different Python versions. If you are using hatch to support your packaging workflow, you may want to also use its testing capabilities rather than using nox.
  • make: Some developers use Make, which is a build automation tool, for running tests due to its versatility; it’s not tied to a specific language and can be used to run various build processes. However, Make’s unique syntax and approach can make it more challenging to learn, particularly if you’re not already familiar with it. Make also won’t manage environments for you like nox will do.

Run tests across Python versions with nox

Nox is a great automation tool to learn because it:

  • Is Python-based making it accessible if you already know Python and
  • Will create isolated environments to run workflows.

nox simplifies creating and managing testing environments. With nox, you can set up virtual environments, and run tests across Python versions using the environment manager of your choice with a single command.

Note on Nox Installations

  • When you install and use nox to run tests across different Python versions, nox will create and manage individual venv environments for each Python version that you specify in the nox function.
  • Nox will manage each environment on its own.

Nox can also be used for other development tasks such as building documentation, creating your package distribution, and testing installations across both PyPI related environments (e.g. venv, virtualenv) and conda (e.g. conda-forge).

To get started with nox, you create a file at the root of your project directory. You then define commands using Python functions. Some examples of that are below.

Test Environments

By default, nox uses the Python built in venv environment manager. A virtual environment (venv) is a self-contained Python environment that allows you to isolate and manage dependencies for different Python projects. It helps ensure that project-specific libraries and packages do not interfere with each other, promoting a clean and organized development environment.

An example of using nox to run tests in venv environments for Python versions 3.9, 3.10, 3.11 and 3.12 is below.

Warning: Note that for the code below to work, you need to have all 4 versions of Python installed on your computer for nox to find.

Nox with venv environments

Below is an example of setting up nox to run tests using venv which is the built in environment manager that comes with base Python.

Note that the example below assumes that you have setup your pyproject.toml to declare test dependencies in a way that pip can understand. An example of that setup is below.

requires = ["hatchling"]
build-backend = ""

name = "pyosPackage"
version = "0.1.0"
dependencies = [

tests = ["pytest", "pytest-cov"]

If you have the above setup, then you can use session.install(".[tests]") to install your test dependencies. Notice that below one single nox session allows you to run your tests on 4 different Python environments (Python 3.9, 3.10, 3.11, and 3.12).

# This code would live in a file located at the root of your project directory
import nox

# For this to run you will need to have python3.9, python3.10 and python3.11 installed on your computer. Otherwise nox will skip running tests for whatever versions are missing

@nox.session(python=["3.9", "3.10", "3.11", "3.12"])
def test(session):

    # install

    # Run tests"pytest")

Above you create a nox session in the form of a function with a @nox.session decorator. Notice that within the decorator you declare the versions of python that you wish to run.

To run the above you’d execute the following command, specifying which session with --session (sometimes shortened to -s). Your function above is called test, therefore the session name is test.

nox --session test

Nox with conda / mamba

Below is an example for setting up nox to use mamba (or conda) for your environment manager. Note that unlike venv, conda can automatically install the various versions of Python that you need. You won’t need to install all four Python versions if you use conda/mamba, like you do with venv.

Note: For conda to work with nox , you will need to ensure that either conda or mamba is installed on your computer.

# This code should live in your file
import nox

# The syntax below allows you to use mamba / conda as your environment manager, if you use this approach you don’t have to worry about installing different versions of Python

@nox.session(venv_backend='mamba', python=["3.9", "3.10", "3.11", "3.12"])
def test_mamba(session):
    """Nox function that installs dev requirements and runs
    tests on Python 3.9 through 3.12

    # Install dev requirements
    # Run tests using any parameters that you need"pytest")

To run the above session you’d use:

nox --session test_mamba

originally published as part of pyOpenSci’s Python Packaging Guide

Run tests with Continuous Integration

Running your test suite locally is useful as you develop code and also test new features or changes to the code base. However, you also will want to setup Continuous Integration (CI) to run your tests online. CI allows you to run all of your tests in the cloud. While you may only be able to run tests locally on a specific operating system, using CI you can specify tests to run both on various versions of Python and across different operating systems.

CI can also be triggered for pull requests and pushes to your repository. This means that every pull request that you, your maintainer team or a contributor submit, can be tested. In the end CI testing ensures your code continues to run as expected even as changes are made to the code base.

CI & pull requests

CI is invaluable if you have outside people contributing to your software. You can setup CI to run on all pull requests submitted to your repository. CI can make your repository more friendly to new potential contributors. It allows users to contribute code, documentation fixes and more without having to create development environments, run tests and build documentation locally.

Example GitHub action that runs tests

Below is an example GitHub action that runs tests using nox across both Windows, Mac and Linux and on Python versions 3.9-3.11.

To work properly, this file should be located in a root directory of your GitHub repository:

   └── workflows/
       └── run-tests.yml # The name of this file can be whatever you wish
name: Pytest unit/integration

      - main

# Use bash by default in all jobs
    shell: bash

    name: Test Run (${{ matrix.python-version }}, ${{ matrix.os }})
    runs-on: ${{ matrix.os }}
      fail-fast: false
        os: ["ubuntu-latest", "macos-latest", "windows-latest"]
        python-version: ["3.9", "3.10", "3.11"]

      - uses: actions/checkout@v4
      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@v4
          python-version: ${{ matrix.python-version }}
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          python -m pip install nox
      - name: List installed packages
        run: pip list
      - name: Run tests with pytest & nox
        run: |
          nox -s tests-${{ matrix.python-version }}
      # You only need to upload code coverage once to codecov unless you have a
      # more complex build that you need coverage for.
      - name: Upload coverage to Codecov
        if: ${{ matrix.os == 'ubuntu-latest' &&  matrix.python-version == '3.10'}}
        uses: codecov/codecov-action@v3

Thanks for this helpful guide. Could you offer some advice about where to put data in the project structure? There’s a note in the Python Package Structure for Scientific Python Projects guide that says, “If your package tests require data, we suggest that you do NOT include that data within your package structure. We will discuss this in more detail in a tutorial.” Could you point me to where that is discussed?

In my case, I am often developing a package as part of a scientific project that includes data, which I put at the top level of the project directory structure. An example tree is below.

├── LICENSE.txt
├── data
│   └── project-data.csv
├── notebooks
│   └── preliminary-analysis.ipynb
├── pyproject.toml
├── src
│   ├── data_dependent
│   │   ├──
│   │   └──
│   └── generic
│       ├──
│       ├──
│       └──
└── tests

Here, the generic directory includes the data-independent, and the data_dependent directory includes, which loads and analyzes data from top-level data directory. My approach has been to add the following code to data_dependent/, to initialize the path DATA_DIR to the data directory for use elsewhere, such as in notebooks/preliminary_analysis.ipynb.

from pathlib import Path

cur_path = Path(__file__).parent.absolute()
DATA_DIR = Path.resolve(cur_path / ".." / ".." / "data")

Thanks for any help you can offer!


Hi @jsdodge this is a great question, thanks for asking.

The answer depends on what you’re trying to achieve.

If the package is part of a single project that produces results, then your approach sounds perfect. In fact, having a ./data directory at the root of the project is exactly what is recommended by “Good Enough Practices for Scientific Computing”. There is a package pyprojroot that in essence does exactly what you do in your data_dependent/, to make the whole project portable.

To be clear, when I say “project”, I mean something like code that accompanies a paper. Also known as a repro-pack or a research compendium.
From your examples, it sounds like that is what you are describing.

On the other hand, if you are developing a more general-use package that you intend to publish so other researchers can use it, then you may want data for other reasons besides producing results. Our guide is meant for a scientist trying to publish a package, and so when we talk about “including data”, that’s what we mean. The two main reasons you want to include data are (1) to test your package, and (2) to provide to users as examples.

Please know I don’t mean that to sound like “our guide is not for you”, if you are working on a project for a paper right now; packages that start out life as part of a paper often turn into more general purpose libraries later, of course.

If you need data for tests or as examples for users, then the answer gets a lot more nuanced, because it really depends on the data.

We have discussed adding a section to the guide on including data with packages–you can see a draft here. Any feedback would be welcome. The short answer is, if the data you want to include is anything larger than a handful of files with a total size > roughly 100 MB, you do not want to include it in the git repo, and you should instead host it somewhere and download it as needed for tests or for a user. If you are under that rough size limit, then you can include the files in the built package, and let a user access them through a function without downloading. More detail on how you would do all of that is in that draft post.

Hope I’m not over-explaining anything here, just trying to write this as clear as possible in case anyone else comes upon the post.

Please let me know if that helps!


Thanks, @NickleDave ! This is exactly the kind of advice and supporting documentation that I was looking for, and both of the use cases that you described are relevant for me.