In-class exercise#
The in-class exercise is distributed via a GitHub Classroom repository. To get access to your group’s git repository, you can follow this link. You will need files from that repository in Task 1; from Task 2 onwards you will work that repository directly.
Task 1#
For this task you will work in your repository of assignment 4.
Copy the file
.pre-commit-config.yaml
from the exercise into the root folder of the repository from assignment 4.Open a shell in the root folder of that repository and execute:
pre-commit install
.Type
pre-commit run --all-files
. Unless you have been very careful with formatting while working on the assignment, this will reformat a few files.Commit all the files that have been changed.
We will now also run the ruff
pre-commit hook. This hook checks for common errors in
Python code and tries to fix them automatically. In some cases an automatic fix is not
possible and you will have to fix the error manually. We will also see how to configure
the behaviour of pre-commit hooks using the pyproject.toml
configuration file.
Open the
.pre-commit-config.yaml
file and uncomment the entireruff
section. Now runpre-commit run --all-files
again. This will reformat and fix some more files.Use
git diff
or the like to see what has changed.Commit all the files that have been changed. You will notice that the pre-commit hooks run again before the commit is completed. If errors remain, skip the hooks by adding the
--no-verify
flag to the commit command. You should hardly ever use that. Here, we do it for didactic reasons.Run
pre-commit run --all-files
again. Very likely, there will still be errors left. Before you can commit, we will need to get rid of them. This can be done in one of two ways:Adjust the source code manually so that it conforms to expectations.
Tell the linter (ruff, yamllint, …) to ignore them. Some errors are safe to ignore in certain projects. E.g., ruff will complain about print statements. This makes perfect sense in serious projects. For our class exercises, they are perfectly fine.
Go through the errors you got in the previous step and decide which strategy you want to apply for each of them.
In order to ignore things, you will need locate the
[tool.ruff]
sections in thepyproject.toml
file. There you can add exceptions for specific errors, either project-wide or for specific files.After each fix, re-run
pre-commit run --all-files
to see if the error is gone and/or you introduced a new one.Commit all the files that have been changed.
Task 2#
The following snippet is a minimal environment file from the screencast:
name: mini-env
channels: [conda-forge, nodefaults]
dependencies:
- python==3.12
- pandas
- pip:
- pdbp
Create the environment on your computer
Use
conda list
to see which packages have been installed into the environment (You will find much more than what was listed)To use modern pandas we need pandas 2.1 or newer and pyarrow 13.0 or larger. Even though you will have gotten very recent versions already, it is a a good idea to add this requirement to the environment file. Do it and update or re-create the environment.
For plotting we will need
plotly
andkaleido
. Unfortunately,kaleido
can only be installed via pip. Add both packages to the yaml file and update or re-create the environment.Delete the environment
Task 3#
Go over your solution to assignment 4 and try to improve the following:
Do all functions have meaningful names that start with a verb in imperative mode?
Do all variables have meaningful names the follow the standard Python naming conventions?
Are there still places in your code that need comments? Could you restructure your code in a way that makes it obvious enough so you don’t need comments, for example by splitting it differently into subfunctions?
Are there any places where you should bundle multiple variables into a NamedTuple or dataclass to make the code more readable or to reduce the number of variables that have to be passed to subfunctions?