In-class exercise#

The in-class exercise is distributed via a GitHub Classroom repository. To get access to your group’s git repository, you can follow this link.

The first person will create a group and set up the repository for the group; the others will land on the webpage of your group’s repository immediately. All group members can then clone the repository.

Introduction#

In this exercise you will use pytask for the first time. You will take code from in-class exercise 6 (the one on functional data management and plotting) so you don’t have to write too much code and can focus on the mechanics of using pytask.

Important#

You will find the solution to all tasks in the folder solution. You will write your own solutions in the folder exercise. When executing pytask you should cd to exercise or solution (depending on which one you want to look at). Do not just execute pytask in the root of this repository.

Task 1#

Copy all function definitions from the solution to in-class exercise 6, task 1 into a file called task_clean_election_data.py.

Add a task function at the top of the file (after import statements but before function definitions) that does the following things:

  • load the original dataset

  • load the metadata

  • call the clean_data function you copied

  • save the result in a bld folder under a suitable name.

Run pytask from a terminal in the exercise directory and verify that the task is executed correctly.

Task 2#

The purpose of this task is to make sure that your dependencies are configured correctly.

Open a shell in the directory exercise and execute pytask.

  • Don’t make any changes and run pytask again. The result should be that all tasks are skipped because nothing has changed. If this is not the case, ask for help.

  • Now delete the created dataset and run pytask again. The result should be that your task runs again and the created dataset is stored on disk. If this is not the case, you have a problem in your product specification (in the produces statement). Debug it!

  • Now add a comment or docstring in task_clean_election_data.py and run pytask again. The result should be that the task is executed again. If this is not the case, you have a problem in your dependency specification (in the depends_on statement). Debug it!

Task 3#

In task 5 of in-class exercise 6 we use the gapminder data. Create the file task_prepare_gapminder_data.py and add the following functions:

  • A function called _reduce_gapminder_data that takes the raw gapminder data as a DataFrame and returns a reduced DataFrame containing only the countries Algeria, Egypt, Sudan and South Africa

  • A function called task_prepare_gapminder_data that downloads the gapminder data, calls the _reduce_gapminder_data function and saves the result under a suitable name in the bld folder.

No matter how little there is to do, you should implement your main logic in a function that works on Python objects (here DataFrames) and delegate all loading and saving to the task function.

Task 4#

Add a file called task_create_simple_lineplot.py and create the simple line-plot from in-class exercise 6, Task 5.

Again, you can define all functions you need inside the task file but you should define a plotting function that only works on python objects (and returns python objects).

Save the generated plot under a suitable name in the bld folder.

Task 5#

Add a file called task_create_highlighted_lineplot.py and add a function that creates the lineplot from tasks 8, 9 and 10 in in-class exercise 6.

Add a corresponding task function that imports data, calls the create_plot function and saves the generated plot under a suitable name in the bld folder.

Task 6#

Add a file called task_create_scatterplots.py that loops over a task function in order to create the three scatterplots from in-class exercise 6, task 4.

Task 7 (Bonus)#

Refactor your plotting code in create_highlighted_lineplot.py such that there is one main function that calls several private functions that do the actual plotting.

Note that when writing code for complex plots, you are allowed to use functions that modify a figure in-place, i.e. have a side effect.