In-class exercise

Contents

In-class exercise#

The in-class exercise is distributed via a GitHub Classroom repository. To get access to your group’s git repository, you can follow this link. You will need files from that repository in Task 1; from Task 2 onwards you will work that repository directly.

Task 1#

For this task you will work in your repository of assignment 4.

  • Copy the file .pre-commit-config.yaml from the exercise into the root folder of the repository from assignment 4.

  • Open a shell in the root folder of that repository and execute: pre-commit install.

  • Type pre-commit run --all-files. Unless you have been very careful with formatting while working on the assignment, this will reformat a few files.

  • Commit all the files that have been changed.

We will now also run the ruff pre-commit hook. This hook checks for common errors in Python code and tries to fix them automatically. In some cases an automatic fix is not possible and you will have to fix the error manually. We will also see how to configure the behaviour of pre-commit hooks using the pyproject.toml configuration file.

  • Open the .pre-commit-config.yaml file and uncomment the entire ruff section. Now run pre-commit run --all-files again. This will reformat and fix some more files.

  • Use git diff or the like to see what has changed.

  • Commit all the files that have been changed. You will notice that the pre-commit hooks run again before the commit is completed. If errors remain, skip the hooks by adding the --no-verify flag to the commit command. You should hardly ever use that. Here, we do it for didactic reasons.

  • Run pre-commit run --all-files again. Very likely, there will still be errors left. Before you can commit, we will need to get rid of them. This can be done in one of two ways:

    1. Adjust the source code manually so that it conforms to expectations.

    2. Tell the linter (ruff, yamllint, …) to ignore them. Some errors are safe to ignore in certain projects. E.g., ruff will complain about print statements. This makes perfect sense in serious projects. For our class exercises, they are perfectly fine.

  • Go through the errors you got in the previous step and decide which strategy you want to apply for each of them.

    In order to ignore things, you will need locate the [tool.ruff] sections in the pyproject.toml file. There you can add exceptions for specific errors, either project-wide or for specific files.

    After each fix, re-run pre-commit run --all-files to see if the error is gone and/or you introduced a new one.

  • Commit all the files that have been changed.

Task 2#

The following snippet is a minimal environment file from the screencast:

name: mini-env
channels: [conda-forge, nodefaults]
dependencies:
  - python==3.12
  - pandas
  - pip:
      - pdbp
  1. Create the environment on your computer

  2. Use conda list to see which packages have been installed into the environment (You will find much more than what was listed)

  3. To use modern pandas we need pandas 2.1 or newer and pyarrow 13.0 or larger. Even though you will have gotten very recent versions already, it is a a good idea to add this requirement to the environment file. Do it and update or re-create the environment.

  4. For plotting we will need plotly and kaleido. Unfortunately, kaleido can only be installed via pip. Add both packages to the yaml file and update or re-create the environment.

  5. Delete the environment

Task 3#

Go over your solution to assignment 4 and try to improve the following:

  1. Do all functions have meaningful names that start with a verb in imperative mode?

  2. Do all variables have meaningful names the follow the standard Python naming conventions?

  3. Are there still places in your code that need comments? Could you restructure your code in a way that makes it obvious enough so you don’t need comments, for example by splitting it differently into subfunctions?

  4. Are there any places where you should bundle multiple variables into a NamedTuple or dataclass to make the code more readable or to reduce the number of variables that have to be passed to subfunctions?